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INTRODUCTION 


During  the  last  five  years  research  has  been  performed  under 
this  contract  on  the  problem  of  stabilization  and  control  of  nonlinear 
stochastic  systems  observed  by  noisy  measurement  data,  and  the 
difficulties  encountered  in  processing  this  noise-contaminated  measure¬ 
ment  data  to  obtain  accurate  estimates  of  the  state  of  the  system. 

Such  problems  are  inherent  in  many  Air  Force  system  applications. 

If  it  is  possible  to  estimate  the  state  of  the  system  accurately ,  then 
well-known  classical  deterministic  control  techniques  may  often  be 
used  to  give  adequate  system  performance.  This  approach  will 
greatly  reduce  the  complexity  of  the  control  algorithm  over  that 
required  by  a  truly  "optimal11  stochastic  control  policy.  On  the  other 
hand,  the  use  of  nonlinear  filtering  techniques  in  place  of  the  simpler, 
linearized  or  extended  Kalman  filters  can  greatly  increase  the  accu¬ 
racy  of  the  estate  estimates  and  thereby  improve  system  performance 
and  alleviate  divergence  problems. 

Investigation  of  the  use  of  these  nonlinear  filters  to  help  in 
the  control  of  nonlinear  dynamic  stochastic  systems  has  also  been 
performed.  The  approach  has  been  to  utilize  the  nonlinear  filters 
in  a  feedback  loop  to  help  obtain  better  estimates  of  state  and  to 
use  these  estimates  to  generate  "dual”  control  laws. 

The  work  has  been  summarized  each  year  in  an  interim  report 
and  in  publications  in  the  open  literature.  The  work  done  on  the 
contract  will  also  be  summarized  below  and  the  publications  listed  in 
Section  3.  In  addition,  work  in  progress  will  be  described.  This  is 
work  that  should  lead  to  publications  but  which  has  not  yet  been 
accepted  for  publication.  Two  efforts  in  this  category  are  contained 
in  Sections  5  and  6.  Lastly,  but  perhaps  of  just  as  much  importance, 
is  the  spill-over  effect  which  has  occurred  in  this  basic  research  to 
the  more  real-world  contract-related  work  performed  by  ORINCON 
Corporation.  In  particular,  three  such  contract  areas  will  be  cited 
below.  Each  of  these  is  directly  built  upon  the  techniques,  capa¬ 
bilities,  and  experience  gained  by  performing  the  basic  research 
work  for  AFOSR.  This  is  covered  in  Section  2  which  follows. 
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SPINOFFS  FROM  THIS  AFOSR  CONTRACT 


Nonlinear  estimation  /stochastic  control  techniques  have  been 
used  in  two  recent  aerospace-related  efforts.  In  a  project  for  the 
Naval  Weapons  Center  at  China  Lake,  California,  ORINCON  developed 
an  algorithm  for  ghost  discrimination  in  the  jammer  localization  problem. 
The  heart  of  this  algorithm  is  a  bank  of  extended  Kalman  filters  which 
tracks  every  possible  target  candidate  (actual  jammers  and  ghosts) . 
Then,  the  outputs  of  these  filters  are  processed  to  correlate  the 
observed  motion  of  the  targets  with  tracking  aircraft  maneuvers. 

The  degree  of  correlation  allows  each  target  to  be  labeled  as  either 
a  jammer  or  ghost.  The  details  of  this  work  are  contained  in  the 
ORINCON  final  report  on  that  project. 

In  a  basic  research  effort  for  the  Armament  Division  (DLMA) 
at  EgUn  AFB ,  Florida,  entitled  "Tactical  Missile  Guidance  with  Uncer¬ 
tain  Measurements , "  ORINCON  is  currently  developing  a  series  of 
guidance  laws  for  short  range  air-to-air  missiles  using  stochastic 
control  methods.  To  date  we  have  developed  a  dual-control  type  law 
which  explicitly  enforces  zero  expected  final  miss  while  minimizing 
a  combination  of  the  trace  of  the  final  state  error  covariance  matrix 
and  a  standard  quadratic  control  term.  This  law  has  been  success¬ 
fully  tested  with  a  simple  planar  problem.  We  have  also  developed 
a  new  technique  for  propagating  an  approximation  of  the  state  proba¬ 
bility  density  funtion  forward  in  time  while  using  measurements  to 
update  this  density  function.  We  anticipate  that  this  method  will 
form  the  basis  for  a  new  stochastic  control  algorithm. 

In  several  related  contracts  for  the  Defense  Advanced  Research 
Projects  Agency,  ORINCON  has  developed  an  approach  and  a  software 
test  bed  implementing  many  of  the  concepts  defined  in  Publication  #11. 
This  has  been  extended  to  not  only  multiple  targets,  clutter  and  multiple 
sensors  but  even  to  different  sensor  types.  The  basic  approach  is 
that  of  Bayesian  (Gaussian  sum  or  Gaussian  mixture)  nonlinear  problems. 
This  consists  of  banks  of  Kalman  filters  and  multiple  hypothesis  testing. 
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It  is  no  exaggeration  to  say  that  this  work  is  a  direct  follow-on  of 
the  work  discussed  in  Reference  11  which  arose  directly  from  sug¬ 
gestions  of  CoL  Allen  Dayton  when  he  was  in  AFOSR. 
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3,  PUBLICATIONS  CREDITED  TO  THE  CONTRACT 

Work  on  this  contract  has  allowed  continued  investigation  into 
the  development  and  use  of  nonlinear  filters  in  conjunction  with  deter¬ 
ministic  control  laws.  Much  of  the  detail  on  the  work  performed 
under  Contract  F44620-75-C-0023  is  contained  in  the  publications  listed 
below,  all  of  which  have  been  published,  accepted  or  submitted  for 
publication  since  the  beginning  of  this  contract. 

1.  D.  L.  Alspach  and  R.  N.  Lobbia,  "A  score  for  correct  data 
association  in  multitarget  tracking,"  an  invited  paper  in  the 
1979  Proceedings  of  the  IEEE  Decision  and  Control  Conference 
in  Orlando,  Florida,  in  December,  1979. 

2.  D.  L,  Alspach,  "A  Gaussian  sum  Bayesian  approach  to  passive 
bearings-only  tracking,"  invited  paper.  Proceedings  of  the 
Office  of  Naval  Research  Sponsored  Conference  on  Target 
Motion  Analysis,  Monterey  Naval  Postgraduate  School,  25-27 
May  1977. 

3.  D.  L.  Alspach  and  H.  W.  Sorenson,  "Approximate  solutions 

of  the  nonlinear  filtering  equations,"  invited  chapter  in  forth¬ 
coming  book.  Nonlinear  Estimation  and  Filtering  Theory:  A 
Status  Review,  E.  Stear  (EdP,  to  be  published  by  Marcel 
Dekker . 

4.  D.  L.  Alspach,  "A  discussion  of  the  relationships  between 

the  dual  goals  of  stochastic  control,"  An  International  Journal 
of  Computers  and  Electrical  Engineering,  4:1,  January  1977. 

5.  D.  L.  Alspach,  "A  stochastic  regulator  using  a  certainty  equiva¬ 
lence  control  with  a  nonlinear  filter  for  processing  hard-limited 
data,"  Information  Sciences.,  Vol.  13,  1978. 

6.  L.  L.  Scharf  and  D.  L.  Alspach,  "Nonlinear  state  estimation 
in  observation  noise  of  unknown  covariance,"  International 
Journal  of  Control,  1978. 

7.  L.  L.  Scharf  and  D.  L.  Alspach,  "Nonlinear  state  estimation 
in  observation  noise  of  unknown  covariance,"  Proceedings  of 
the  1976  Joint  Automatic  Control  Conference  (West  Lafayette^ 
Indiana,  July  27-30,  1976Y. 

8.  D.  L.  Alspach,  "Nonlinear  filters  in  feedback  control," 

Proceedings  of  the  Sixth  Symposium  on  Nonlinear  Estimation 
Theory  and  Its  Applications ,  San  Diego  ,  California  (September 
197577 -  - 
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D.  L.  Alspach,  "A  certainty  equivalence  control  with  a  non¬ 
linear  filter  in  the  feedback  loop,"  Proceedings  of  the  1975 
IEEE  Symposium  on  Decision  and  Control,  Houston,  Texas 
(December  10-12,  1975). 

10.  D.  L.  Alspach,  "A  stochastic  control  algorithm  for  systems 
with  control  dependent  plant  and  measurement  noise,"  An 
International  Journal  of  Computers  and  Electrical  Engineering, 
2:4,  November  1975. 

11.  D.  L.  Alspach,  "A  Gaussian  sum  approach  to  the  multitarget 
identification-tracking  problem,"  Automatica ,  Vol.  11,  pp. 
285-296  (August  1975). 


PUBLICATIONS  SUMMARIES 


Following  are  brief  summaries  of  the  work  contained  in  the 
publications  listed  in  the  previous  section  of  this  report. 

The  multitarget  tracking  problem  is  defined  in  Publication  #1 
as  that  of  taking  a  number  of  measurements  obtained  from  several 
sensors  and  determining  track  estimates  for  any  targets  that  are 
"heard"  by  these  sensors.  In  the  real  world  environment  the  mea¬ 
surements  are  cluttered  by  random  noise.  In  these  situations,  it 
is  difficult  to  determine  precisely  which  target  (if  any)  corresponds 
to  each  measurement.  Typical  problems  which  arise  with  tracking 
algorithms  include:  too  few  tracks  are  formed;  too  many  tracks  are 
formed  (false  tracks);  and,  inaccurate  position,  course,  and  speed 
estimates  are  reported.  The  above  difficulties  are  often  the  result 
of  incorrect  allocation  of  data  to  individual  tracks .  Algorithms ,  while 
etimating  the  motion  of  a  given  target,  inadvertently  mix  in  clutter 
and/or  measurements  from  another  target. 

In  order  for  a  correct  allocation  to  be  made,  we  must  have 
an  effective  scoring  formula;  i.e.,  some  means  of  determining  how 
likely  a  given  assignment  of  data  is.  To  be  effective,  a  scoring 
formula  must  produce  (on  the  average)  a  better  score  for  correct 
assignments  than  for  incorrect  assignments.  Information  useful  in 
the  scoring  process  includes  a  priori  intelligence  data  (such  as  initial 
target  locations)  ,  models  of  target  motion,  models  of  the  transmission 
channel,  and  expected  moments  of  clutter  for  the  sensor  gain  setting 
being  used.  Basically,  the  score  is  derived  from  the  residuals  which 
come  out  of  the  processing  of  a  batch  of  data  with  the  extended 
Kalman  filter.  This  is  used  to  evaluate  the  likelihood  of  potential 
tracks.  Although  "likelihood"  has  a  useful  intuitive  meaning,  we  use 
the  term  to  mean  the  probability  density  function  p(A)  of  the  track  A. 
The  expected  cost  of  a  given  assignment  is  derived  with  the  theory 
of  extremals  being  used  to  obtain  the  expected  cost  of  adding  a 
clutter  point  in  a  track. 
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In  Publication  #2,  a  specific  application  of  the  use  of  Gaussian 
^  sums  to  the  bearings -only  target  motion  analysis  problem  was  pre¬ 

sented  at  the  Naval  Postgraduate  School  in  Monterey.  This  invited 
paper  was  published  in  the  Proceedings  of  the  conference,  the  main 
theme  of  which  was  bearings-only  target  motion  analysis. 

A  summary /review  paper  (Publication  #3)  was  prepared  as  an 
invited  chapter  of  a  text  on  nonlinear  estimation,  edited  by  Dr.  E. 
Stear,  now  Chief  Scientist  of  the  Air  Force.  The  paper  is  entitled 
"Approximate  Solutions  of  the  Nonlinear  Filtering  Equations"  and  the 
book  will  be  entitled  Nonlinear  Estimation  and  Filtering  Theory:  A 
Status  Review.  The  work  done  on  the  contract  to  date  including  that 
in  many  of  the  above  publications  is  summarized  in  this  review  chapter, 
as  well  as  work  by  other  workers  in  the  field.  Due  to  Dr.  Stear's 
heavy  commitment  of  time  to  his  duties  in  working  for  the  Air  Force, 
this  book  has  not  yet  been  published.  For  this  reason  this  paper  is 
attached  as  an  appendix  to  this  final  report. 

A  general  philosophical  approach  to  stochastic  control  is  dis¬ 
cussed  in  Publication  #4  above.  The  method  of  aligning  the  "dual" 
goals  of  the  general  optimal  stochastic  control  as  a  design  tool  is 
discussed.  When  the  two  goals  are  exactly  aligned,  the  certainty 
equivalence  control  is  optimal  and  no  additional  intentional  "probing" 
is  required.  If  these  goals  are  "anti-aligned , "  demanding  opposing 
controls,  the  certainty  equivalence  control  can  be,  locally  at  least, 
the  worst  control  possible.  An  example  with  this  characteristic  has 
been  discussed  in  Publication  #4. 

In  Publication  #5  we  consider  a  simple  example  of  a  nonlinear 
filter  in  a  feedback  loop.  For  this  case  the  "dual  goals"  are  aligned 
and  it  is  shown  that  the  performance  of  the  system  is  very  close  to 
that  of  an  optimal  stochastic  control.  Since  the  optimal  stochastic 
control  algorithm  is  very  difficult  to  calculate,  this  is  done  by  com¬ 
paring  the  performance  to  a  known  lower  bound.  This  lower  bound 


is  found  in  the  following  manner.  It  is  clear  that  there  is  more 


information  about  the  state  in  a  linear  measurement  of  the  state 


contaminated  by  noise  than  in  the  hard-limited  version 

>’HL(k): 

zLlN(k>  =  Hkxk  +  vk 

yHL(k)  =  SIGN(2LIN(k))  =  SIGN(Hkxk  +  vR)  . 


With  the  linear  measurement  function,  the  optimal  stochastic  control 
is  given  by  the  separation  theorem.  It  is  clear  that  the  performance 
of  the  optimal  stochastic  control  system  with  only  the  hard-limited 
function  available  will  be  worse  than  the  system  with  the 

linear  function  (k) .  It  is  shown  in  Publication  #5  that  given 

only  yHL(k)  ,  the  performance  of  the  certainty  equivalence  control 
with  a  nonlinear  filter  is  close  to  that  of  the  optimal  control  system 
with  the  better  linear  measurement. 

In  Publications  #6  and  #7,  a  new  nonlinear  filter  was  developed 
in  conjunction  with  Dr.  L.  Scharf  of  Colorado  State  University.  This 
paper  considers  the  adaptive  Kalman  filtering  problem  where  only  the 
measurement  noise  covariance  is  unknown.  A  new  parallel  filtering 
algorithm  is  developed. 

In  Publication  #8,  the  effect  of  control-dependent  plant  and 
measurement  noise  on  the  feedback  control  is  discussed.  It  is  shown 
that  the  goals  are  effectively  aligned,  but  not  of  the  same  magnitude. 
The  use  of  any  control  action  reduces  the  accuracy  of  the  state  esti¬ 
mates.  Thus,  the  effect  of  such  control-dependent  noise  is  to  induce 
"caution11  on  the  control. 

In  Publication  #9,  the  filter  for  processing  hard-limited  mea¬ 
surement  data  was  first  introduced.  In  Publication  #10,  a  filter  for 
state  estimation  in  systems  with  state-  and  control-dependent  mea¬ 
surement  noise  was  introduced.  In  Publication  #11,  Gaussian  sum 
filters  are  used  in  a  Bayesian  approach  to  the  multitarget  tracking 
problem . 
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EFFECTS  OF  COMBINATION  OF  MEASUREMENTS  ON  SYSTEM 
STATES  REFERENCING  NONLINEAR  FILTERING 


Some  recent  work  on  this  contract  has  been  the  study  of  cer¬ 
tain  very  simple  types  of  measurement  data  that  can  lead  to  states  of 
knowledge  (  a  posteriori  density  function)  which  cannot  be  described 
*  by  linearized  filtering  theory.  These  lead  to  very  non-Gaussian 

a  posteriori  density  functions  in  the  only  reasonable  coordinates 
system  possible. 

^  The  effects  of  n  asurements  such  as  bearing,  speed,  relative 

time  delay  and  rate  of  change  of  relative  time  delay  on  state  estimate 
has  been  considered.  In  many  systems  in  which  radars  of  various 
types,  acoustic  transducers,  other  satellite  sensors,  etc.,  are  used 
^  to  generate  measurements  on  one  or  more  targets,  measurements  of 

the  above  type  are  produced.  Great  difficulty  can  be  encountered 
by  anyone  blindly  trying  to  apply  linearized  Kalman  filter  techniques 
to  this  type  of  data.  Some  simple  examples  of  how  this  can  come 
^  about  are  described  below. 

First,  however,  let  us  remind  ourselves  of  the  basic  structure 
of  a  discrete  time  extended  Kalman  filter.  It  consists  of  a  set  of 
recursion  relations  for  propagation  forward  in  time  a  state  estimate 

>  and  a  covariance  matrix.  If  no  modeling  errors  occur  (the  nonlinear¬ 
ities  are  known  exactly  and  the  noise  statistics  are  correctly  modeled) 
filtering  errors  still  occur  in  the  application  of  the  linearized  extended 
Kalman  filter.  In  considering  such  filters  one  often  performs  sensi- 

►  tivity  analysis  and  studies  of  regions  of  validity  of  the  linearizations 
inherent  in  both  the  prediction  and  measurement  (filtering)  stages. 
These  techniques  are  well  documented  and  will  not  be  repeated  here. 
However,  an  even  more  fundamental  question  is  "do  the  measurements 
and  dynamics  even  lead  to  a  roughly  unimodal,  elliptically  shaped 
region  in  state  space?"  If  a  bimodal  or  completely  non-elliptically 
shaped  region  is  given  as  the  region  with  non-neghgible  probability 
mass  from  a  correct  utilization  of  the  Bayesian  recursion  relations, 
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one  must  either  find  a  different  coordinate  system  (where  this  is  the 
case)  or  give  up  on  using  a  simple  extended  Kalman  filter.  First, 
consider  a  very  simple  example  of  four  states  with  linear  system 
dynamics . 


[Xj  x2  x3  x4l  =  [x,  X,  y,  y] 
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Consider  first  the  case  where  we  have  a  very  small  amount  of 
dri/ing  noise  and  continuous  speed  measurements.  If  the  target  was 
knowrn  to  be  at  the  origin  at  time  tg,  its  state  trajectory  would  be 
limited  to  the  circle  shown  in  Figure  1.  If  the  initial  uncertainty 
was  the  ellipse  shown  in  Figure  1,  its  state  would  have  to  lie  within 
an  annulus  indicated  in  Figure  2  and  shown  more  completely  in 
Figure  3.  If  this  probability  mass  was  cut  by  a  bearing  from  a 
sensor  at  location  A,  the  a  posteriori  density  will  have  non-negligible 
mass  as  shown  in  Figure  4.  If  it  is  hard  to  imagine  a  state  space 
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Figure  4, 


Bearing  measurement. 
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for  which  this  area  could  be  well  defined  by  a  Gaussian  density. 

But  this  is  just  what  would  be  necessary  in  order  to  find  a  coordinate 
system  for  which  an  extended  Kalman  filter  would  be  a  reasonable 
filtering  algorithm.  It  should  be  noted  that  a  two-element  mixture 
density  or  Gaussian  sum  density  might  be  very  adequate  (though 
not  perfect)  in  describing  the  state  of  knowledge  indicated  in  Figure  4. 
Also,  a  Gaussian  density  in  range  could  perhaps  be  used  to  describe 
the  state  of  knowledge  described  in  Figure  3.  Thus,  one  might  have 
to  change  coordinate  systems  with  the  data  accumulated  and  switch 
from  simple  extended  Kalman  filters  to  Gaussian  sum  filters  or  banks 
of  Kalman  filters  to  adequately  process  data  for  this  very  simple 
situation.  It  should  be  noted  that  going  to  three  dimensions  would 
not  greatly  change  the  character  of  this  example. 

A  second  type  of  measurement  available  from  pairs  of  sensors 
(LORAN,  etc.)  is  the  time  delay  difference  between  a  signal  arriving 
at  each  sensor  (bistatic).  Thus,  if  two  sensors — A^  and  A2 — are 
located  as  shown  in  Figure  5,  lines  of  constant  time  delay  difference 
are  shown  in  Figure  6.  The  third  dimension  would  be  handled  by 
rotating  these  lines  around  the  lines  joining  sensors  A^  and  A2. 

A  small  amount  of  uncertainty  is  indicated  in  Figure  7  for  a  very 
small  negative  value  of  time  delay  difference.  For  a  slightly  larger 
positive  time  delay  difference  with  small  uncertainty,  the  probability 
mass  is  shown  in  Figure  8.  If  a  bearing  measurement  is  processed 
as  shown  the  a  posteriori  density  (the  interceding  area)  could  very 
well  be  approximated  by  a  single  Gaussian.  Thus,  an  extended 
Kalman  filter  could  very  well  be  used  to  process  this  data  with  the 
possibility  that  the  result  would  be  close  to  that  obtained  from  an 
optimal  non-linear  filter. 

A  case  of  slightly  large  positive  time  delay  difference  and 
different  sensor  position  is  shown  in  Figure  9.  In  this  case  no 
extended  Kalman  or  linearized  filters  could  give  a  reasonable  solu¬ 
tion  for  the  real  a  posteriori  density.  Again,  however,  a  two  term 
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Gaussian  sum  density  could  well  be  used  as  an  adequate  approximation 
to  the  density  function  which  would  result  from  an  optimal  nonlinear 
filter.  Note  in  Figure  10  that  the  same  effect  would  occur  if  an 
a  priori  density  (shown)  was  followed  by  the  same  time  delay  differ¬ 
ence  measurement  as  in  Figure  9.  Thus,  in  these  last  few  examples 
for  the  same  type  of  data,  in  some  cases  an  extended  Kalman  filter¬ 
ing  approach  could  work  very  well  and  in  other  differing  really  only 
in  the  measurement  data  itself,  no  linearized  filter  could  give  reason¬ 
able  results.  Note  that  a  linearized  filter  could  be  developed  and 
run  on  a  number  of  test  cases  with  good  results  without  even 
encountering  these  problems.  Due  to  the  robust  character  of  the 
Kalman  filter  structure,  when  such  a  filter  was  used  in  a  real  system 
where  both  types  of  "good  and  bad"  data  types  were  encountered, 
this  filter  might  be  used  without  the  realization  by  the  users  that  in 
certain  cases  greatly  sub-optimal  results  were  being  obtained.  For 
example,  in  Figure  9  one  would  probably  obtain  an  a  posteriori 
Gaussian  density  near  one  of  the  two  possible  intersections  of  the 
hyperbola  and  the  bearing. 

These  examples  indicate  the  kind  of  danger  that  is  routinely 
encountered  in  implementing  linearized  filters  without  great  insight 
to  the  nature  of  the  measurements  and  dynamics  of  the  system 
under  investigation.  In  both  of  the  above  examples,  one  might 
"feel"  that  the  problem  is  obvious  and  a  careful  patch- up  of  a  linearized 
system  could  be  made  by  a  good  analyst.  A  number  of  cases  observed 
in  the  real  world  and  in  publications  lead  us  to  believe  that  even  this 
is  rarely  the  case.  However,  one  more  example  will  be  shown  where 
the  same  effect  is  seen  but  the  result  is  not  so  a  priori  obvious. 
Consider  a  case  such  as  LORAN  where  not  only  the  time  delay  differ¬ 
ence  is  measured  but  also  the  derivative  of  time  delay  difference  is 
available  as  a  measurement. 

For  this  case  it  can  be  seen  that  this  measurement  is  bilinear 
in  the  x  and  y  rate  state  variables: 


R^(x,y)  -  R2(x,y) 


x  g , (x,y)  +  y  g-(x»y) 


c  =  speed  of  light. 


It  is  possible  to  look  at  contour  plots  of  g^(x,y)  and  g^(x,y)  in 
Figures  11,  12,  and  13. 

We  will  consider  the  case  where  we  also  have  some  knowledge 
of  the  maximum  speed  of  any  target  of  interest  for  simplicity. 


S=>/x2  +  y2  ^  Sr 


One  can  define  the  a  posteriori  density  of  a  target  in  x,y  space  for 
a  given  measurement  value.  For  example,  if  the  measurement  is  near 
zero  and  we  know  that  from  the  measurements  on  past  data  that  the 
target  velocity  is  defined  by: 


x  -  S  ,  y  ~  0 ,  z  ~  0 
max  7 


For  a  defined  measurement  uncertainty  the  a  posteriori  distribution 
is  shown  in  Figure  14  as  a  contour  plot  and  in  Figure  15  as  a  three- 
dimensional  plot.  Note  the  fact  that  two  distinct  areas  are  defined. 
For  a  measurement  of  .1  of  zmax*  f°ur  distinct  ridges  are  obtained 
as  shown  in  Figure  16.  When  this  is  raised  to  .  8  of  the  maximum 
possible,  an  annulus  is  formed  as  in  Figure  17  as  a  contour  plot  and 
in  Figure  18  as  a  three-dimensional  plot.  Note  if  the  derivative  of 
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Figure  17.  Derivative  of  time  delay  difference 
^  .8  x  maximum. 
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time  delay  is  measured  to  be  .5  times  the  maximum  and  a  time  delay 
from  Figure  _ is  measured,  the  two  distinct  areas  shown  in  Fig¬ 

ure  19  are  obtained.  Note  this  could  again  be  approximated  by  a 
two-term  Gaussian  sum  density.  The  same  type  of  effect  is  seen 
for  this  case  (x  s  max,  y  *  0)  in  Figure  20  for  doppler  difference 
of  .  1  of  the  maximum  possible  and  the  same  time  delay  difference 
as  Figure  19. 

For  nearly  the  same  geometry  but  for  a  target  with 
y  *  max  ±  10%  uncertainty 
x  «  0  ±  10%  uncertainty 

we  obtain  quite  different  results.  Thus,  if  the  measurement 
z  25  0 

the  a  posteriori  density  is  shown  in  Figure  21  as  a  contour  plot  and 
in  Figure  22  as  a  three-dimensional  plot.  When  z  goes  to  .05  of  the 
maximum  the  contour  plot  becomes  as  shown  in  Figure  23  and  at  z  of 
.1  of  the  maximum  we  obtain  Figure  24  as  a  contour  plot  or  Figure  25 
as  a  not  too  well  executed  (due  to  problems  with  the  three-dimensional 
package)  three-dimensional  plot. 

When  any  of  these  are  combined  with  a  bearing  or  time  delay 
difference  hyperbola  we  again  obtain  cases  where  a  set  of  Gaussian 
sum  densities  could  give  a  quite  reasonable  approximation  to  the 
a  posteriori  density.  However,  it  is  hard  to  believe  that  a  simple 
modification  to  linearized  filtering  theory  could  lead  to  an  adequate 
solution  to  the  filtering  problem  posed  by  the  intermixing  of  this 
type  of  data.  This  is  true  even  though  these  measurements  are 
very  simple  and  mimic  those  found  in  a  number  of  real-world  Air 
Force  applications.  There  is  clearly  a  need  to  consider  the  use  of 
non-linear  filters  in  cases  where  the  measurements  are  a  function 
of  range  or  speed  in  conjunction  with  bearing  measurements. 
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6.  OPTIMAL  SEARCH  PATTERNS  FOR  MULTIPLE  TARGETS 

A  problem  of  great  interest  to  the  Air  Force  is  that  of  detecting 
the  presence  of  multiple  targets  in  clutter  from  several  sensors  of 
varying  capability.  Each  sensor  could  have  different  probability  of 
detection/probability  of  false  alarm  characteristics  as  a  function  of 
settable  (or  controllable)  parameters.  Each  sensor  could  also  have 
the  region  of  measurement  space  covered  by  a  controllable  function 
perhaps  with  constraints. 

One  approach  to  this  problem  is  considered  here. 


Let  p(x)  =  probability  density  function  for  location 
of  targets 

u(x)  =  search  effort  density 

Then  the  detection  function  to  be  maximized  by  u(x)  is  given  by: 


D(u)  =  / 

Jx 


[1  -  e  U^x^)  p(x)  dx  , 


where  X  is  the  surveillance  space  to  be  searched. 


Let  U  =  total  maximal  search  effort  available  to  the 

max 


system 


Then,  we  must  satisfy  the  constraint 


u(x)  dx  =  U 
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I 


I 


) 


I 
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The  problem  is  to  select  an  admissible  search  plan  u*(x) 
which  maximizes  D(u)  over  a  specified  surveillance  region.  Or: 


Maximize: 
subject  to 


D(u) 

J[  u ( x)  dx  =  U 

X 


max 


Theorem:  For  the  problem: 


Maximize 
subject  to 


D(u) 


dx 


^  U 

max 


there  exists  a  real  number  X  >  0  (also  known  as  the  Lagrange  multi¬ 
plier)  such  that: 


1.  u*(x)  =  -£nX  +  £np(x) 

X 

=  0 


and 


2. 


u* (x) dx 
X 


U 

max 


where  the  region  IR^  is  defined  by 


on  1RX 
otherwise 


IR^  =  {x  I  In  p(x)  ^  in  X} 


□ 


► 
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Therefore,  to  obtain  a  complete  solution  to  our  problem,  we  must 
first  find  a  X*  >  0  for  which 


/. 


u,  +  (x)  dx  =  U 


max 


and  also  determine  the  search  region  IR^.  The  search  region  ]R^  is 
dependent  on  the  choice  of  X*. 

Unfortunately,  the  task  of  determining  X*  from  the  above 
integral  equation  is  a  non-trivial  one. 

Explicit  analytical  expressions  can  be  derived  for  X*  and  IR^ 
if  the  density  p(x)  is  Gaussian.  In  this  case: 


p(x)  =  c  exp  (-i||x-x||  ) 

P 


where 


x 
P  = 
c  = 


=  mean 


covariance  matrix 
normalizing  constant 


Consider  the  function: 


x,n  p(x)  -  in  X 


in  c  -  in  A  -  i||x-x||2. 


£n  y  "  il!x-x||"_ 
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The  search  region  is  then  defined  by: 


IR  (X)  =  {x|£np(x)  -  £n  X  i  0} 


=  {x|  ||x-x||2  ,  ^  +  2  £n  Y  )  • 

P  A 


Define  the  effort  function  U(X)  by 


U(X)  = 


■/. 


u^(x)  dx 


**<» 


where  R^fX)  is  as  defined  above  and 


u^(x)  =  £n  p(x)  -  £n  X 


i  l|x-x||2_1  +  £n  j  . 
P  1  A 


Thus , 


U(X)  = 


■/ 

^IRx(X) 


Un ! 


i  II x -  x||  dx 

P 


Let  denote  the  volume  of  the  search  region;  that  is, 


i 


dx  . 


Then,  for  n-dimensional  state  vectors. 


■/. 


dz 

IS  1 
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with 


z  =  S  ( x  -  x) 

B2  =  {2 1  II2II2  £  +  2  tn  j  } 

where  matrix  S  is  defined  by 

sTs  =  p_1 

and  |S  |  =  determinant  of  S  . 

=  i/JTpT 

And, 


where 


a)  =  surface  area  of  an  n-dimensiona]  unit  SDhere 
n 


r  (in) 


"„<2> 


h !  ] 


to  (2) 

n 


n+2 

«  2 


hi] 


“n!2> 


hi]2  {£-&} 


n+2  - 

-y-  n+2 

OJ  (2)  r  1  2 

n  In  c  1 


|S|  n(n+l) 


>!] 


The  useful  case  of  search  in  a  plane  (i.e.f  latitude  and 
longitude)  corresponds  to  n  =  2,  and 


U(A)  =  t r  a.o,  Jl  -  P 2  £n  - - 


271X0^2  1  -  p2 


7T  0^2  \l  1  “  p2  {  In  (277X0^2  \^T“  p  2 )  ]  * 


where 


p  =  correlation  coefficient  for  the  Gaussian  density. 


Solving  the  algebraic  equation 


U(A)  =  U 


05^7  *  > 
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Three  intuitively  satisfying  conclusions  follow  from  the  above 
search  patterns: 

1.  In  a  limited  resource  search  effort,  the  optimal  search  is 
confined  to  a  closed  and  bounded  region; 

2.  the  search  effort  is  concentrated  around  our  best 
knowledge  of  the  existing  position  of  the  target.  The 
search  effort  gradually  decreases  away  from  the 
estimated  location;  and 

3.  as  the  available  search  effort  U  is  increased,  the 

max 

optimal  search  region  is  expanded  to  include  more 
surveillance  area. 

It  is  easily  seen  that  the  effort  density  is  non-negative 
(^0)  over  the  restricted  search  region  1R^.  To  see  this,  consider 
for  x  e  IR 

x 


Using  u*(x)  given  above  the  corresponding  detection  proba¬ 
bility  D(u*)  can  now  be  computed. 


[1  -  e  p ( x)  dx 


e  p(x)  dx 


£n  a  +  kn  p (x) 


on  R 


outside  IR 


exp(-£np(x)  -  £nX)  p(x)  dx  -  I  p(x)  dx 


p(x)  dx  -  Xs 


t  ii  ^  r  ^ 

-■2  ||  X”X||  _j 

3  P  dx 


■i  •" 

:  •'m 


d  z  -  X 


V  v '‘A*  f  A/ nA  »> 
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2  n-1  , 

e  r  dr 


-  A*  2  In 


where  =  2  £n  — — 

max 


D  ( u*)  = 


r/.: 


n~ 2  n-2 

(2)  2  y  2  e  y  dy 


r  -i  n/2  , 

-x*[2t"£] 


In  n  =  2  (Planar  search) 


D(u*)  = 


•  2tt  r 

w  Jr  -‘ 


e  ^  dy  - 


2  In  -fj  ]  -I- 

L  x  J  isi 


D(u+)  = 


y  4 


-  A  2  £n 


C  TT 


~i  ^  T*  *  c  1 

2c  1  -  e  -  \  Zln  — — 

L  J  L  x  J  |S  | 


^2c  fl  -  J  —  - 


ic  1  -  v  -  ~  2  A  £n  -^r 
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Note  that  as  u  t  00  »  \  4-  0  and  D(u*)  t  1.  Thus,  in  the 

max 

event  that  unlimited  resources  are  made  available  to  the  searcher, 
the  detection  probability  will  approach  unity. 

Extensions  of  these  results  to  multi-target  search  situations 
are  not  straightforward.  Multiple  targets  in  the  system  can  be 
categorized  into  three  types: 

1.  Existing  known  targets 

2.  New  targets 

3.  False  targets 

One  must  resort  to  acceptable  approximations  to  be  able  to 
obtain  closed  form  solutions  to  the  search  problem.  The  case  of 
known  multiple  targets  can  be  treated  by  using  such  approximations. 
Extensions  of  results  are  not  so  easy  for  cases  in  which  either  new 
targets  are  entering  the  system  or  there  is  a  possibility  of  false 
targets  (such  as  clutter  or  other  large  objects)  .  In  these  cases 
one  must  further  postulate  acceptable  forms  of  probabilistic 
description  of  these  events  or  one  must  use  empirical  definitions 
from  past  knowledge  of  these  events. 

Next,  we  will  consider  the  case  in  which  search  patterns 
are  found  for  known  multiple  targets. 

Suppose  that  there  are  m  targets  in  the  surveillance  area. 

t  h 

The  location  probability  density  function  for  the  i  target  is  given 
by  the  a  posteriori  density  function  p(x.  jz)  ,  when  z  denotes  all 
the  available  information.  For  the  sake  of  notational  simplicity,  this 
a  posteriori  density  will  be  denoted  by  p(x).  The  probability 
density  function  for  locating  a  target  in  the  surveillance  space  can 
then  be  considered  as  a  mixture  or  Gaussian  sum  density  given  by 


m 


p(x)  =  53  ai  P^x) 

i=l 

where  m  =  number  of  targets  in  the  system 

a.  =  probability  that  the  target  is  selected. 


In  the  absence  of  a  priori  knowledge  a.  is  taken  to  be  1/m,  thus 
giving  equal  probability  for  any  target  to  be  selected. 

The  maximum  detection  probability  search  pattern  for  the 
multiple  target  system  is  then  given  by: 


u*(x)  =  £n  p(x)  -  Hn  A*  x  e  IR^ 

=  0  x  t  HR 

r  x 

where 

]R  =  {x  |  p ( x)  2  X*  } 
and  A*  is  given  by 


u*(x)  dx 


U 

max 


To  indicate  the  degree  of  difficulty  in  solving  the  integral 
equation  to  compute  A*,  consider  the  case  in  which  there  are  two 
targets  in  the  system  (m=2).  Assume  that 


i  =  1,  2 


Then  the  integral  U(A)  requires  evaluation  of  the  following  integral: 


'JR 


Zn  CfCz) )  — 


where 


IR  =  I  2  |-$]]z)j  +  £nll+a10  e 


and 


/  -(SL-xO  S  z\ 

\l+a12  e  / 

2  "tn(wJ7^]7pT)| 


12 


e  1  *  P  1 


It  is  not  possible  to  solve  the  above  integral  in  a  closed  form. 

[See  ,  page  00,  eq.  00.  ] 

Thus,  approximations  are  required  to  simplify  the  integer 
even  for  the  simplest  case  of  two  Gaussian  targets.  Two  situations 
where  simplifications  are  possible  are: 

1.  The  estimates  x.  for  target  location  are  separated  suffi¬ 
ciently,  i.e.,  ||x.-Xj||  >>  1;  and 

2.  The  estimates  x.  are  sufficiently  close  together,  i.e., 
||x.-x.||  «  1  ! 

In  the  first  case,  the  search  area  IR^  (to  which  the  search 

effort  will  be  confined)  can  be  decomposed  into  several  disjoint 

sub^regions  IR  .  In  each  of  these  sub-regions  IR  ,  the  effect  of 
x.  x. 

presence  of  targets  other  than  the  i  target  can  then  be  ignored 
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and  closed  from  expressions  for  A*  and  the  corresponding  u*(x)  can 
be  determined. 

In  the  second  case,  the  system  location  density  can  be  approxi¬ 
mated  by  a  single  unimodal  target  which  dominates  the  search  and  the 
solution  described  for  the  single  target  case  remains  valid. 

Under  the  assumption  that 

Hftj  -  *,1  »  i 

it  is  possible  to  select  a  A*  >  0  such  that  the  total  search  region 
IR  can  be  decomposed  into  disjoint  subregions  1RX  . 


Define  L(x)  =  {p^ (x)  /p^  (x)  }  . 


If  p.(x)  has  a  form  which  decreases  at  least  as  fast  as  exp(- \ ||x  ||^)  , 
as  |! x  |j  increases,  then 

f. ( x)  <<  1 
1 

for  all  x  £  IRX. ,  a  neighborhood  of  .  Using  this  approximation  for 
widely  separated  targets  it  is  now'  possible  to  define  IRX^  in  the  fol¬ 
lowing  way.  (Only  Gaussian  densities  are  considered.) 

A 

Let  A  be  the  derived  value  of  the  constant  approximating  the 
Lagrange  multiplier.  Let  x*  be  such  that 


f.(x+)  ^  min  f.(x)  . 


3 


X  E  IRX. 


J 


Define 


JR*. 

>>1 


jx!c.  exp(-i||x-x.||2  )  >  -  -  j 
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Thus  it  is  seen  that 


IR  C  JR 

Xj  X 


since  x  e  IRXj  implies  that 


-2U1X-X  |J  _ 

J  p.  1 
r) 


=C>  'i 

H^>  p(x)  s  A  . 


>) 

)  (1  +  f.(x* 


))  *  A 


-z||x-x.  j[2_1 

e  Pi  •  Cl  +  f.(x))  >  A 


Using  the  results  of  the  single-target 


case : 


L 


U(A)  =  /  u(x)  dx  = 


-£/  • 

i=l  J]R 

x. 


(x)  dx 


n 


-  E  S 


i=l 


P*  TT  0.  C. 

1  >1  *2 


In 


(’*¥**>  \\  '^7?) 


Let  8.  = 


;i  =  71  °i  °i  /l-pf 
*1  2  1 


a.  =  £n  28. 

i  l 


1  +  f.  (x*) 
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At  this  time  numerical  studies  of  this  approach  to  optimal 
search  patterns  implying  optimal  sensor  allocation  are  being 
performed . 

r 


t 
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I .  INTRODUCTION 


Many  physical  systems  are  described  most  appropriately  by  mathe¬ 
matical  models  which  take  into  account  the  stochastic  influences  which 
act  upon  the  system.  Furthermore,  the  behavior  of  the  system  can 
seldom  be  observed  in  such  a  manner  that  its  state  is  known  precisely 
at  each  time.  Instead,  the  system  is  observed  through  the  measurement 
of  variables  which  yield  incomplete  information  and  which  contain 
unknown  or  random  errors.  In  general  one  must  estimate,  from  these 
noisy  data,  the  state  of  the  system  and,  frequently,  take  some  type 
of  "control"  action  based  on  these  estimates. 

The  problem  of  estimating  the  state  of  a  nonlinear  stochastic 
system  from  noisy  measurement  data  has  been  the  subject  of  consider¬ 
able  research  interest  during  the  past  few  years  but,  although  a 
great  deal  has  been  published  on  the  subject,  the  basic  objective  of 
obtaining  a  solution  that  can  be  implemented  in  a  straightforward 
manner  for  specific  applications  has  not  been  satisfactorily  realized. 
This  is  manifested  by  the  fact  that  the  Kalman  filter  equations  [1,2], 
which  are  valid  only  for  linear,  gaussian  systems,  continue  to  be 
widely  used  for  nonlinear,  nongaussian  systems.  Of  course,  continued 
application  has  resulted  in  the  development  of  ad  hoc  techniques,  dis¬ 
cussed  below,  that  have  improved  the  performance  of  the  Kalman  filter 
and  which  give  it  some  of  the  characteristics  of  nonlinear  filters. 


Central  to  the  nonlinear  estimation  and  stochastic  control 
problems  is  the  determination  of  the  probability  density  function  of 
the  state  conditioned  on  the  available  measurement  data.  If  this 
a  posteriori  density  function  were  known,  an  estimate  of  the  state 
for  any  performance  criterion  could  be  determined.  Unfortunately, 
although  the  manner  in  which  the  density  evolves  with  time  and 
additional  measurement  data  can  be  described  in  terms  of  difference 
(or  differential)  equations,  these  relations  are  generally  very 
difficult  to  solve,  either  in  closed-form  or  numerically,  so  that 
it  is  usually  impossible  to  determine  the  a  posteriori  density  for 
specific  applications.  Because  of  this  difficulty,  it  is  natural 
to  investigate  the  possibility  of  approximating  the  density  with 
some  tractable  form. 


II.  THE  GENERAL  PROBLEM 

Many  formulations  of  stochastic  control  problems  are  possible 
and  have  appeared  in  the  literature.  If  the  objective  is  to  determine 
computational  algorithms,  it  is  reasonable  that  attention  should  be 
directed  toward  the  development  of  control  and  estimation  policies  that 
explicitly  assume  that  events  occur  at  discrete  instants  of  time. 

Suppose  that  the  state  vector  £  the  system  evolves  accord¬ 
ing  to  the  nonlinear  stochastic  difference  equation: 


ik,!=4  (2k'  V  !*) 


k  =  0,  1,  ....  N-l 


(1) 


where 


x^  —  n-dimensional  state  of  the  system  at  the  time  t^; 

~  p-dimensional  control  vector  that  acts  on  the  system 

for  h  $  *  <  Vi; 

~  q-dimensional  plant  random  noise  vector  that  acts  on 
the  system  for  t^  <  t  < 


The  random  noise  sequence*  (w^,  w^,  ...  ,  w^)  4  w  is  assumed 
to  have  a  known  probability  distribution  p(w  )  such  that  the 
are  independent  between  sampling  times  [i.e.,  pCw^,  ...,  w^)  = 

p(w^),  pfw^,  ...,  p(w^)  for  all  k].  Sequences  having  this  character¬ 
istic  will  be  referred  to  as  white-noise  sequences.  The  initial 
state  Xq  is  also  considered  to  be  random  variable  with  a  known 
distribution  and  is  taken  to  be  independent  of  the  plant  noise. 

The  behavior  of  the  plant  (1)  is  observed  imperfectly  through 
measurement  quantities  that  are  related  in  a  prescribed  fashion 

to  the  state  but  which  contain  random  errors. 


4  =  ^(4  ;  k  =  0,  1,  ....  N. 


(2) 


Throughout  this  work  a  vector  or  scalar  with  an  algebraic  super¬ 
script  (k)  will  mean  the  total  array  of  such  vectors  which  have 
occurred  at  all  times  up  to  and  including  t^ . 
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where 


~  m-dimensional  vector  of  known  measurement  data  at  the 

time  t,  ; 
k 

r-dimensional  measurement  noise  vector  contaminating 
the  data  at  t^. 

The  noise  sequence  v  is  assumed  to  be  a  white-noise  sequence 
with  a  known  distribution  and  to  be  independent  of  the  initial  state 
and  all  plant  noise. 

The  noise  sequence  v,  and  w  are  taken  to  have  zero  mean  and 

the  initial  state  vector  to  have  mean  x*. 

— o 


£(Xo)  =  V  , 


E(v,,)  =  0  , 


E(wJ  =  0 

A 


The  covariance  of  the  white-noise  sequence  and  defined  by 


E(wk  w.  )  =  Qk  fikj  > 


E^k  ^3  )  '  \  6k3 


and  the  initial  state  covariance  P  1  is  defined  by 

o 


E  (x  -  x’)  (x  -  x’r  =  P  ’ 
1  — o  -o  “O  /  O 


(3) 


(4) 


(5) 


Based  on  the  system  (1)  -  (2),  it  is  possible  to  define  the 
stochastic  control  problem.  Before  doing  so,  it  should  be  noted  that 
it  has  been  assumed  that  the  probability  distributions  for  all  random 
variables  are  known.  It  is  possible  to  consider  a  more  general 
problem  in  which  the  distributions  are  unknown.  This  situation  has 
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been  referred  to  by  Bellman  [3]  as  an  adaptive  control  problem  to 
distinguish  it  from  the  stochastic  problem  that  is  being  formulated 
here. 

1c  +Y 

The  estimation  of  the  state  from  the  data  z_  can  be 

separated  into  three  subproblems : 


1. 

Filtering: 

estimate 

at  the  present  stage  k 

y  -  o 

using  all 

past  and  current  data  z  . 

2. 

Prediction: 

estimate 

x^  at  some  future  stage  k 

y  <  0 

using  all 

available  data  at  stage  k+y,  y  <  0. 

3. 

Smoothing: 

estimate 

x^  at  some  earlier  stage  k 

Y  >  0 

using  all 

available  data  at  stage  k+y,  y  >  0. 

In  the  absence  of  plant  and  measurement  noise,  the  problem  that 


L 


is  considered  below  would  have  the  following  simple,  deterministic 

k 

statement.  Determine  the  state  x^  from  the  measurement  data  z_  . 
When  stochastic  effects  are  included  in  the  model  as  in  (1)  -  (2), 
the  statement  of  the  filtering  problem  comes  to  include  an  element 
of  arbitrariness.  Certainly,  the  basic  objective  is  to  ’'estimate” 
the  state  from  the  data.  However,  the  random  effects  in  the  system 
model  implies  that  redundant  data  must  be  collected  in  order  to 
minimize  the  noise  influence  on  the  estimate.  Now,  it  becomes 
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.  'v.  • 


A 


A 

i  I 


reasonable  to  define  "best"  estimates,  thereby  introducing  the 
arbitrariness  mentioned  immediately  above. 

In  considering  the  question  of  estimating  x^  from  the  mea- 

K  +Y 

surement  data  ,  it  is  necessary  to  specify  the  criterion  that  is 

to  be  used  to  define  a  ffbest"  estimate.  However,  whatever  criterion 

1c  +Y 

is  used,  the  density  function  p(x^/£  )  contains  all  of  the 

information  that  is  required.  In  fact  this  density  function  provides 

the  most  complete  description  of  the  system  that  is  possible.  Thus, 

the  estimation  problem  can  be  approached  from  this  Bayesian  viewpoint 

[4,5]  without  specifying  the  criterion  because  one  is  first  concerned 

with  determining  p(x  /z  1 )  or  a  valid  approximation  to  it. 

k 


The  Bayesian  approach  described  below  applies  to  all  three  of 

the  subproblems  of  filtering,  prediction,  and  smoothing.  The  actual 

calculation  of  the  smoothing  density  is  considerably  more  complicated 

k  +y 

than  the  other  two  while  the  prediction  density  p(x^/  £  )  (Y  <  0) 

follows  simply  from  the  filtering  of  a  posteriori  density  p(x^/  £^) • 
In  the  following  discussion  we  will  specialize  to  the  filtering  and 
prediction  cases  and  most  of  the  research  results  apply  to  these 


For  systems  of  the  type  considered  here  the  a  posteriori 
density  function  p(x^.  /  £  )  provides  the  most  complete  description 
possible  of  the  so-called  state  vector  which  is,  of  course,  a 

random  variable.  p(x^  /  £  )>  on  the  other  hand,  is  a  deterministic 
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function  which  is,  in  theory,  completely  determined  by  the  a  priori 
statistics  of  the  noise  and  initial  state  density  function  pfx^), 
combined  with  the  measurement  data  _z  =  (z^,  z^) . 

If  this  function  is  taken  as  the  description  of  the  state,  it 
reduces  to  the  unit  impulse  at  the  true  state  whenever  perfect 
knowledge  of  the  state  is  obtained.  Thus,  accepting  this  as  a 
valid  definition  of  state,  it  becomes  necessary  to  obtain  explicitly 
the  a  posteriori  density  function  or  a  "good"  approximation  to  it  in 
order  to  solve  both  the  estimation  and  control  problems.  In  fact, 
once  this  is  accomplished  it  is  possible  to  estimate  the  random 
variable  state  according  to  any  criterion  function. 

While  the  a  posteriori  density  provides  a  complete  solution  of 
the  filtering  problem,  it  has  the  disadvantage  that  it  is  a  function 
rather  than  a  finite-dimensional  estimate.  If  the  problem  were 
deterministic,  the  solution  would  be  provided  by  the  vector 
that  satisfied  the  plant  and  measurement  equations  for  all  k. 

A  similar  "solution"  is  commonly  sought  for  the  stochastic  problem. 
To  obtain  estimates  useful,  but  often  arbitrary,  performance 

criteria  are  defined  which  lead  to  "optimal"  estimates  [6]. 

Examples  of  such  criteria  are  the  minimum  mean  square  error, 
minimum  absolute  deviation,  maximum  a  posteriori,  and  maximum 
likelihood.  The  Bayesian  approach  yields  all  of  the  informa¬ 
tion  necessary  to  obtain  estimates  satisfying  any  of  these 
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criteria  so  it  is  not  necessary  at  this  point  to  be  more  specific 
about  the  performance  criterion.  However,  it  may  be  instructive  to 
see  a  mathematical  statement  of  the  estimation  problem  for  the  more 


1 


MINIMUM  ABSOLUTE  DEVIATION 


aMAD  k 

The  estimate  x^  j  ^  of  the  state  x^  based  on  data  _z  is 

chosen  so  that  the  expected  value  of  the  absolute  value  of  the  error 

is  minimized. 


^MAD 

3k  |  k 


u1- E  Z  |xik 

^  i-i 


Note  that  all  of  these  estimates  as  well  as  many  others  can  be 
obtained  from  the  a  posteriori  deviation  function. 


III.  THE  BAYESIAN  APPROACH 

Much  of  the  current  research  on  nonlinear  filtering  is  con¬ 
cerned  with  recursive  formulations  in  which  the  solution  for  the 
solution  of  the  (k-l)th  stage  is  used  to  obtain  the  solution  for  the 
kth  stage.  Only  the  recursive  formulation  shall  be  considered  here. 

A  general  solution  of  the  recursive  filtering  problem  can  be 
obtained  through  Bayes1  rule. 

The  major  feature  which  distinguishes  this  approach  from  other 
possible  approaches  is  the  assumption  of  the  existence  of  well  de¬ 
fined  a  priori  probability  density  functions  for  all  unknown  vectors 
entering  the  plant  or  measurement  equations.  In  the  Bayesian 
procedure  the  measurement  data  is  used  to  modify  the  probability 
density  function  of  the  state  vector  based  on  all  previous  measure¬ 
ments  and  this  a  posteriori  density  function  is  used  together  with 
the  known  dynamical  plant  and  plant  noise  probability  density  function 


to  obtain  the  predicted  density  function  for  the  state  at  the  next 
stage.  Thus  the  probability  density  function  for  state  at  stage  k 
based  on  all  available  measurements  is  calculated  in  a  recursive 


fashion. 

In  the  Bayesian  approach  to  determining  recursive  estimation 
and  control  policies  for  stochastic  systems  one  is  concerned  primarily 
with  the  a  posteriori  density  function,  p(x^|z^),  and  the  one  stage 
predicted  density  function,  pCx^^l^).  These  density  functions 
contain  all  of  the  information  required  for  solution  of  both  problems, 
and  can  be  described  by  a  recursion  relation  that  is  useful  for 
obtaining  recursive  filters  and  closed  loop  control  policies  [4, 5, 7, 8], 
These  recursion  relations  are  given  below: 


p(4li ■  fpfaji'1)  p(ikl4-r  4-i)  dVi  (1I) 

where  the  normalizing  constant  is  given  by: 


and  where 


(12) 


(13) 


r 


f 


Utilization  of  (10)  to  (12)  in  conjunction  with  the  prescribed 

initial  condition  (13)  provides  the  information  required  for  the 

i  k 

determination  of  p(Xjl.z  )  for  any  k.  Thus,  a  general  solution  of 
the  nonlinear  filtering  problem  is  available.  Unfortunately,  the 
actual  evaluation  of  the  Bayesian  recursion  relations  for  a  specific 
nonlinear  system  is  not  accomplished  in  a  trivial  manner.  It  is  to 
the  problem  of  developing  computational  algorithms  for  evaluating 
(10)  to  (13)  for  specific  systems  that  the  remainder  of  this 
discussion  is  directed. 

When  the  system  is  nonlinear  or  when  the  noise  is  nongaussian, 
two  problems  arise.  First  the  integrations  in  the  Bayesian  recursion 
relations  cannot  be  carried  out  in  closed-form  and,  second,  the 
moments  are  not  easily  obtained  from  (10).  The  moments  are  useful  in 
establishing  practical  estimation  and  control  policies  so  their 
determination  is  an  important  consideration.  These  two  aspects 
pinpoint  the  source  of  the  difficulties  involved  in  the  determination 
of  estimation  and  control  policies  for  nonlinear  and/or  nongaussian 
stochastic  systems  when  trying  to  apply  the  Bayesian  method. 


The  densities  an<*  PQSjJ^k-1^  can  written  more 

explicitly  in  the  special  case  that  the  noise  terms  w^  and  v^ 
are  assumed  to  be  gaussian  and  to  enter  equation  (1)  and  (2)  in  an 
additive  fashion.  Then  by  assuming  that  *  and  *  exist,  we 


can  write  these  densities  as 


P^liSk)  -  kv  exp  -  h^)]  ^  U*  -  hk(*k>J 


(14) 


p(^k+ll 


K  exP 


W 


r^+i 


-  4(*k)] 


-1 


[^k+i  - 


4(^k)]i 


(15) 


Given  the  a  priori  density  functions,  the  a  posteriori  density 
functions  can  t>e  determined  for  any  sampling  time  t ^  if 

the  integrations  required  can  be  accomplished  in  a  closed-form.  In 
general,  this  cannot  be  done  and  suggests  that  some  type  of  approxi¬ 
mation  must  be  considered.  When  only  the  first  two  moments  are 
known,  or  the  initial  density  is  nongaussian,  it  is  common  to  approximate 
the  density  as  a  gaussian  with  these  first  and  second  moments. 

Another  method  is  to  linearize  nonlinear  problems  around  a  known 
nominal  and  to  assume  the  noise  is  gaussian  in  the  linearized  problem. 

The  reason  for  wanting  the  problem  in  this  form  is  well  known  [2].  In 
this  case  the  a  posteriori  and  the  predicted  density  functions  are 
Gaussian  and  the  Bayesian  recursion  relations  can  be  solved  in  a 
closed  form.  In  fact,  since  a  Gaussian  density  function  is  completely 
determined  by  its  first  and  second  order  statistics,  the  functional 
recursion  relation  reduces  to  a  recursion  relationship  for  these 
statistics.  These  relations  have  come  to  be  referred  to  as  the 
Kalman  filter  equations  [l,2]. 


IV.  THE  LINEAR  CASE 


In  many  applications  of  the  theory  a  nominal  trajectory  in 
state  space  is  known  or  assumed  and  the  nonlinear  dynamical  and 
measurement  equations  are  linearized  about  this  nominal.  Because 
of  the  comparative  ease  of  solution  of  this  linearized  problem  relative 
to  the  nonlinear  one  and  because  of  its  applicability  in  many  instances, 
this  linear  problem  has  recei^ 
of  Equations  (1)  and  (2)  are: 


^k+l  Fk 


-k 


“k4  +4 


much  attention. 

The 

linear  version 

• 

+  r,  u.  +  w. 

(16) 

k  — k  -k 

;  k  =  0,  1,  ... 

N. 

(17) 

The  assumption  that  w^,  v.  are  independent  white  noise  sequences, 

both  independent  of  Xq>  will  be  made  here,  but  is  not  necessary  [9]. 

F^f  H^,  and  F^  are  known  deterministic  quantities  at  time  t^ 

and  the  statistics  of  w  ,  v  ,  and  x^  are  all  defined  in  Section  II, 

If,  in  addition,  v,  ,  w  ,  and  x^  are  gaussian  random  variables, 

— k  — k  “0 

equations  (10)  -  (13)  can  be  solved  exactly  to  give: 


[Hk  pk  "i  *  \] 1 
k  -  ii  *  hh  -  «k 


U8) 

(19) 


P,  =  P »  -  K.  H.  P' 
k  k  k  k  k 


(20) 


OS' 


,^/Ay\vv; 
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(21) 


P' 

k+1 


F,  P.  F,  +  Q, 
k  k  k  k 


ik.i  ‘  Fk  h  *  rk  2k  <22> 

where 

ik  ■  E(iklik)  <23> 

k  -  E(2klik'1)  <24> 

"" 

pk  ■  E  ^ik-2kH4-ik)TUk_  (25) 

pi  'E  [fik-^H4-|1;)Tli.k-1  (26) 


and  where  can  be  any  function  of  2_  and  the  a  priori  data  and 

thus  can  be  a  function  of  x^. 

These  recursion  relations  are  exact  for  the  Gaussian  problem  and 
are  referred  to  as  the  Kalman  filter  [ 1 , 2 ] •  Several  characteristics 
of  this  filter  should  be  noted.  First,  the  mean  of  the  a  posteriori 
density  x^  always  provides  the  minimum  mean-square  estimate  for  the 
state.  In  this  case  of  linear  systems,  when  the  a  priori  densities  are 
Gaussian,  the  mean  provides  the  optimal  estimate  for  a  large  class  of 
estimation  criteria  [b].  Secondly,  the  P  matrix  arising  in  the 
Kalman  filter  is  the  covariance  of  the  error  in  the  estimate,  and  it 
is  independent  of  all  measurements . 

If  the  noise  is  non-Gaussian  and  the  minimum  mean-square  error 
estimate  is  desired,  the  Kalman  filter  still  provides  the  best  "linear” 
estimate  for  state.  In  this  case,  however,  estimators  with  smaller 
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error  variances  are  possible.  This  can  be  seen  from  the  fact  that, 


in  general,  the  variance  is  a  function  of  the  measurements.  A  simple 
example  shoving  this  is  given  below. 

Consider  the  scalar  one  stage  plant: 


z 


0 


+  vA 


(27) 


where  both  x^  and  are  uniformly  distributed  on  (-1,  1)  with 

variance  a2  =  1/3.  The  approximation  of  xQ  and  by  gaussian 

random  variables  gives: 


Pa(vq)  =  NCv0,  1/3)  =  exp(-l .  5  v*)  /  */2V3  (28) 


PA(X0)  =  N(*o»  V3) 

(29) 

PA(X0iz0}  =  N(V  V  2*  1/6) 

(30) 

Thus  giving  the  linear  or  Kalman  predicted  variance  of  1/6.  The 

exact  value  of  a2  is  plotted  in  Figure  1  versus  and  the 

2 

Kalman  approximation  to  it,  °^airnan  =  1/6,  is  also  indicated. 

With  the  true  distribution  of  xQ  and  vQ  the  measurement  must  be 
contained  in  the  interval  (-2,  2)  while  for  the  same  system  with 
gaussian  noise  any  measurement  is  possible. 

For  both  the  true  and  approximate  cases  the  mean  of  the 
a  posteriori  density  function  is  £q •  This  is  because  the  true 
mean  is  linear  in  for  one  stage.  This  is  not  true  for  subsequent 
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stages  and  as  a  result  the  Kalman  filter  only  gives  the  best  linear 
projection  of  the  best  minimum  variance  estimate  for  more  than  one 
stage.  However,  as  shown  in  Figure  1,  the  major  problem  with  the 
Kalman  filter  is  that  the  variance  calculated  by  it  is  not  a  very 
realistic  approximation  of  the  true  variance.  This  carries  over  to 
the  nonlinear  case  and  has  been  one  of  the  major  problems  of  the 
linearization  procedure  [lO].  Note  that  in  this  simple  example  the 
true  variance  can  be  from  twice  the  Kalman  estimate  to  zero.  If 
one  wants  to  find  an  estimate  for  state,  even  in  the  linear  non- 
gaussian  case,  different  from  minimum  variance,  then  the  Kalman 
estimate  does  not  even  represent  a  best  linear  estimate.  This 
simple  example  has  been  considered  at  length  in  references  [11,12] 
where  a  larger  number  of  stages  and  where  several  approximate 
density  functions  were  compared  with  the  true  one.  Other  cases  of 
linear  systems  with  nongaussian  noise  and  initial  states  are  discussed 
in  references  [11-15].  This  special  case  of  linear  systems  with  non¬ 
gaussian  noise  definitely  requires  nonlinear  processing  of  the  data 
in  order  to  form  optimal  state  estimates. 


1 


* 
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V.  APPROXIMATE  SOLUTION  OF  THE  BAYESIAN  RECURSION  RELATIONS 


As  discussed  in  me  previous  section,  the  only  general  system  for 
which  closed-form  solutions  of  (10)  -  (13)  can  be  found  is  when  the 
plant  and  measurement  equations  are  linear  and  the  statistics  of  the 
noise  and  initial  state  are  Gaussian.  Then,  the  a  posteriori  density  is 
Gaussian  and  the  conditional  mean  and  covariance  are  described  by  the 
Kalman  filter  equations.  Unfortunately,  it  is  necessary  to  seek  the 
solution  of  the  Bayesian  recursion  relations  numerically  for  nonlinear 
or  non-Gaussian  systems. 

In  essence,  we  are  faced  with  the  problem  of  evaluating  multi- 

i  k-1 

dimensional  integrals.  Certainly,  the  determination  of  p(x,  (£  ) 

using  (11)  requires  an  integration.  The  calculation  of  the  filtering 
density  pO^lf^  using  (10)  is  seen  to  require  the  multiplication 
of  two  density  functions.  This  does  not  represent  a  difficult  task 
other  than  in  the  storage  requirements  that  are  implied  in  such  an 
operation.  However,  the  normalization  and  the  determination  of 
moments  requires  integration  of  the  product. 

We  shall  first  consider  the  solution  of  the  problem  from  a 
very  basic  point  of  view.  Clearly,  the  a  posteriori  density  is  a 
random  function  of  the  data.  When  a  measurement  realization  is 
available,  then  we  have  the  density  as  a  function  of  the  state  x^. 

To  emphasize  this  and  to  reduce  the  notational  complexity,  let  us 
make  the  following  conventions.  The  prediction  density  shall  be 


written  as 


P(*jj£kl)  =  *00  = 


p(n)  q(x  n)  dn 


(31) 


Using  (15),  this  becomes 

)  dn  .  (32) 

We  have  renamed  x^  as  x  and  x^  ^  as  n .  The  subscripts  denoting 
the  sampling  time  have  been  suppressed  since  they  play  no  active  role 
in  the  discussion.  That  is,  the  Bayesian  recursion  relations  have 
the  same  form  at  every  sampling  time. 


ir(x)  = 


J p(n)  q(il-fk(n) 


Next,  the  filtering  density  shall  be  rewritten  in  the  following 
manner: 

PQ^/^)  =  p(x)  =  ctt(x)  m|s_-hk(x)|  (53) 

where  c  is  the  normalization  constant,  r,  is  the  prediction  density 
as  in  (32),  and  m  is  the  density  of  z  given  x_.  The  measurement 
z  can  be  regarded  as  being  known. 


Consider  the  calculations  required  for  one  complete  stage  of 
the  recursion.  The  filtering  density  p  is  computed  as  the  product 
of  7t  and  m.  Note  that  the  calculations  are  started  at  t ^  with 
7T  equal  to  the  a  priori  density  p(x  )  .  Thus,  p(>c^|£^)/c^  is 
readily  formed  for  all  k.  The  normalization  constant  is  formed  as 
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The  integration  generally  must  be  accomplished  numerically.  It  is 

immediately  apparent  that  a  considerable  computational  burden  can 

be  avoided  if  c  is  not  determined.  If  one  is  interested  in 

-MAP 

obtaining  only  the  MAP  estimator  x  ,  then  c  does  not  have  to 

up 

be  found.  However,  if  the  mean-square  estimator  x  °  is  desired 
or  if  any  moments  of  the  distribution  are  to  be  computed,  then  an 
accurate  value  for  c  is  required. 

After  determining  p,  the  integrand  in  (32)  can  be  formed. 

The  prediction  density  tt  is  obtained  by  carrying  out  the  nonlinear 
convolution  indicated  in  (32).  Again,  it  is  generally  necessary  to 
resort  to  numerical  methods  to  determine  tt .  Since  tt  is  a  function 
of  x,  the  convolution  implies  that  a  large  number  of  numerical 
integrals  must  be  computed;  essentially,  an  integration  for  each 
possible  value  of  x  is  required. 

After  determining  p  and  tt,  it  is  natural  to  compute  moments 
of  the  a  posteriori  density.  As  noted  above,  the  minimum  mean-square 
estimate  is  provided  by  the  conditional  mean.  The  quality  of  the 
estimate  is  commonly  gauged  by  forming  the  conditional  covariance 
matrix.  Conceivably,  higher-order  moments  might  also  be  determined  as 
indicators  of  the  effect  of  the  nonlinearities  and  of  the  deviation 
of  the  a  posteriori  density  from  a  gaussian.  Of  course,  these  are 
not  ensemble  statistics  but  are  associated  with  a  specific  measure¬ 


ment  realization. 


i 


A  large  number  of  methods  for  the  evaluation  of  the  Bayesian 
recursion  relations  have  been  proposed  and  studied.  These  methods 
have  the  common  characteristic  that  the  calculations  are  performed 
after  defining  a  "grid."  The  grid  points  provide  a  finite  collec¬ 
tion  on  which  approximations  can  be  based.  Obviously,  these  points 
are  contained  in  a  finite  region  of  state  space  even  though  the 
integrations  generally  are  carried  out  over  infinite  intervals. 

Thus,  the  functions  must  be  such  that  there  is  negligible  probability 
mass  outside  of  the  region  containing  the  grid  points.  The  manner  in 
which  the  grid  is  defined  is  an  important  consideration  in  the 
development  of  an  algorithm. 

Let  us  consider  an  approach  to  the  evaluation  of  the  nonlinear 
convolution  (32)  .  Suppose  that  a  specific  value  is  prescribed 

for  x  so  the  integration  will  yield  a  well-defined  number.  The 
numerical  integration  of  (32),  essentially  requires  that  the 
integral  be  replaced  by  a  summation  involving  a  discretization  of 
the  integration-variable  n.  The  manner  in  which  the  grid  points  are 
defined  may  be  accomplished  arbitrarily  or  as  in  integral  part  of  the 
quadrature  method.  For  example,  in  an  nth-order  Gauss-Hermite 
quadrature,  the  grid  points  are  chosen  as  the  zeros  of  the  nth 
Hermite  polynomial.  Let  n ^ ,  ( j  =  1 ,  2 ,  . . . ,  1 }  denote  the 

^  grid  points  for  the  variable  n.  Furthermore,  suppose  that  x^ 
is  regarded  as  the  ith  grid  point  for  the  discretization  of  the 
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variable  x  into  points.  Then,  the  convolution  (32)  is 

replaced  by 

Nk-i 

PCxp  =  ajP(nj)  -  f(Hj))  i  i  =  1,2, . . .  ,Nk  .  (35) 

j=l 

The  coefficients  cu  represent  the  weighting  coefficients  of  the 
numerical  integration  scheme.  Clearly,  if  there  are  a  large  number 
of  grid  points  the  storage  and  computational  burden  can  be  enormous, 
even  for  present-day  digital  computers. 

Because  of  the  storage  and  computational  burden  implied  by 
solving  the  Bayesian  recursion  relations,  it  is  natural  to  seek  ways 
in  which  these  requirements  can  be  reduced.  Effectively,  the  non¬ 
linear  filtering  problem  can  be  regarded  in  this  completely  com¬ 
putational  context.  In  the  subsequent  discussion,  we  shall  review 
some  the  approaches  that  have  been  proposed,  summarize  the  types  of 
results  that  have  been  obtained,  and  make  suggestions  for  areas 
requiring  additional  investigation. 

The  earliest  and  by  far  the  most  extensively  applied  approach 
was  motivated  by  the  existence  of  the  general  solution  for  linear, 
gaussian  systems  (i.e.,  the  Kalman  filter).  In  this  case,  a  single 
grid  point  is  defined  at  each  sampling  time.  Then,  the  system  equations 
f  and  h  are  linearized  relative  to  the  grid  point.  This  approxi¬ 
mation  of  the  system  itself  implies  that  the  state  and  measurement 
perturbations  are  gaussian  so  the  Kalman  filter  can  be  applied  directly. 
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A  number  of  generalizations  to  include  higher  order  perturbations 
have  been  proposed.  We  shall  discuss  this  class  of  methods  in 
Section  VI.  Since  the  approximations  can  be  regarded  as  being  most 
accurate  in  some  neighborhood  of  the  single  grid  (or  reference) 
point,  we  shall  refer  to  them  as  local  methods.  More  recently,  a 
second  class  of  techniques  has  emerged  which  explicitly  attempt  to 
obtain  solutions  by  defining  a  grid  over  the  entire  region  containing 
significant  probability  mass.  This  class  shall  be  referred  to  as 
global  methods  and  is  discussed  in  Section  VII. 


VI.  LOCAL  NONLINEAR  FILTERING  METHODS 


Virtually  the  only  recursive  nonlinear  filtering  method  that 
has  seen  application  to  practical  problems  is  the  so-called  extended 
Kalman  filter.  In  this  approach,  a  single  grid  point  is  defined  at 
each  stage  and  the  system  is  linearized  relative  to  this  point.  If 
the  grid  point  is  chosen  as  the  "best"  estimate  (i.e.,  the  approxi¬ 
mation  of  the  conditional  mean),  the  resulting  estimator  is  called 
the  extended  Kalman  filter  [2 , 10] .  This  is  apparently  the  simplest 
possible  approach  since  it  involves  a  single  grid  point  and  linear 
equations  at  each  sampling  time.  In  addition,  the  grid  point  at  the 
kth  time  is  obtained  directly  from  the  previous  grid  point  and  the 
appropriate  system  equation.  It  is  also  a  most  crude  approximation 
and  its  validity  depends  heavily  on  the  quality  of  the  linear 
approximation . 
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Practical  experience  has  demonstrated  that  the  assumptions 
inherent  in  the  extended  Kalman  filter  are  often  valid  and  satisfactory 
results  are  often  obtained.  There  are  also  well-known  disadvantages 
and  difficulties  associated  with  the  application  of  the  extended 
Kalman  filter.  The  manifestation  of  these  difficulties  is  commonly 
referred  to  as  the  divergence  [2,10,16]  problem.  Divergence  is  said  to 
occur  when  the  actual  error  in  the  estimate  becomes  inconsistent  with 
the  error  covariance  matrix  approximation  provided  by  the  filter 
equations.  This  situation  arises  because  of  errors  in  the  filter 
model,  either  as  a  result  of  errors  in  the  basic  model  or  as  a 
result  of  the  linearization  errors. 

Experience  with  the  extended  Kalman  filter  in  a  variety  of 
applications  has  led  to  the  definition  of  a  number  of  subproblems 
that  may  have  to  be  solved  in  order  to  develop  a  useful  algorithm. 

A.  Filter  Initialization 

Before  utilizing  the  Kalman  filter,  it  is  often  necessary  to 
process  a  small  amount  of  data  to  obtain  reference  values  to  be  used 
in  the  linearizations.  Regardless  of  the  manner  in  which  it  is 
accomplished,  the  filter  must  be  initialized  with  suitable  values  for 
the  estimate  and  error  covariance  matrix  in  order  to  obtain  reasonable 
estimates  at  subsequent  times. 


B.  Form  of  the  Filter  Model 

The  linearization  errors  can  be  reduced  in  many  cases  by  the 
form  used  for  the  system  model.  The  choice  of  coordinate  system  can 
be  important  [ 17 3 *  Furthermore,  the  use  of  transformations  [18]  to 
obtain  models  which  are  more  easily  linearized  are  often  possible. 


C.  Iterative  Calculations 

To  improve  the  linearizations,  one  can  iterate  through  a  small 
amount  of  the  data  (e.g.,  one  sample  at  a  time)  and  use  improved 
estimates  in  the  linearizations  before  reprocessing  the  data  [ 1 0] . 


D.  Divergence  Control 

Divergence  often  occurs  because  the  model  does  not  adequately 
describe  the  system.  To  compensate  for  model  errors,  the  plant  and 
measurement  noise  covariance  matrices  or  the  Kalman  gain  matrix 
directly  can  be  increased.  This  has  the  effect  of  causing  the  error 
covariance  matrix  to  be  increased  and  in  a  way  to  cause  past  data  to  be 
discounted  relative  to  more  recent  samples.  A  large  number  of  methods 
have  been  devised  to  compensate  for  model  errors  [e.g.,  2,10,19]. 


E.  Second  Order  Filter 

In  our  Bayesian  context,  the  use  of  the  extended  Kalman  filter 
implies  that  the  a  posteriori  density  is  gaussian.  This  can  be  an 
extremely  poor  approximation  of  the  actual  density  function  if  all 
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possible  values  of  the  state  x  are  considered.  Other  local  (or 
perturbative)  schemes  have  been  devised  in  an  effort  to  improve  the 
quality  of  the  approximation.  The  obvious  extension  [20, 2l]  is  to 
consider  retaining  the  second-order  terms  in  the  expansion  of  the 
system  functions  f  and  h.  Commonly,  the  assumption  is  made  that 
the  a  posteriori  density  is  still  gaussian  even  with  the  presence  of 
the  second-order  terms.  This  assumption  is  made  in  order  to  overcome 
the  "moment  closure  problem"  which  is  discussed  briefly  below. 

For  the  purposes  of  discussion,  suppose  that  we  are  considering 
a  scalar,  second-order  system 


\  -  W-1  '  Vk-1  *  Vi  ■ 

zk  ■  \\  *  ek*k  *  vk  • 


(36) 

(37) 


Suppose  at  the  (k-l)th  sampling  time  that  we  know 


Vl1- 


k-1 


K-l  k-1  ’ 


and 


The  mean  value  E 


var(\.i|ik"1)  "pk-iik-i 


(38) 


(39) 


is  seen  to  be 


E^l-k'1]  **k|k-i  "  fk  Vi|k-i  *  Ek(pk.i|k-i  *  V-ilk-i) 


(40) 
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To  determine  the  variance,  we  note  that 


xk'E[xk|k-l]  =  xk  |  k- 1 

gkXk-l|k-l +  (fk+2gkXk-l|k-ljxk-l|k-l  "  gkPk-l|k-l  +  Wk-1 

SO 

var  (xk  I—**"1)  ■  Vl  *(£k  *  Vk-l|k-l)  Pk-1  Ik-1  <‘2> 

*  2sk(fk  *  2sk;k-i|k-i)  Vi|k-1 

*  sk(  Vl|k-1  '  pk-l|k-l  )  , 


^  represent  the  third  and  fourth  central 
moments  of  ^  given  z^  .  Thus,  the  calculation  of  var(x^|£  “  ) 
requires  knowledge  of  the  first  four  central  moments  of  ^  given 
For  this  example,  the  calculation  of  the  ith  moments  always 
requires  knowledge  of  the  2  ith  moments  at  the  preceding  time.  This 
implies  that  one  must  know  moments  of  every  order  and  is  referred  to 
in  ge  ieral  as  the  moment  closure  problem.  To  close  the  problem,  it 
is  common  to  assume  that  moments  of  order  greater  than  some  integer 
correspond  with  gaussian  moments.  For  example,  if  the  3rd  and  higher 
order  moments  are  assumed  to  be  gaussian,  then  for  all  k 


wnere 


Mk-1  |k  - 1 


and 


Vi|k 


(41) 
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F.  Higher  Order  Polynomial  Expansion 


The  first  serious  attempt  to  eliminate  the  gaussian  assumption 
involved  the  use  of  Gram-Char lier  or  Edgeworth  expansion  [22].  The 
expansion  is  a  series  of  polynomials  which  are  orthogonal  with 
respect  to  a  gaussian  distribution  and  can  be  used  to  represent  a 
wide  class  of  density  functions.  The  initial  use  of  this  non- 
gaussian  approximation  was  based  on  a  perturbative  approximation. 

As  a  consequence,  it  suffered  from  the  disadvantage  that  a  large 
number  of  terms  were  required  to  obtain  t  reasonable  approximation 
of  a  distinctly  nongaussian  density.  The  behavior  of  the  estimator 
obtained  from  this  density  approximation  was  found  to  be  very  sensi¬ 
tive  to  the  quality  of  the  approximation.  When  the  infinite  series 
is  truncated,  as  it  must  for  practical  application,  the  resulting 
series  can  become  negative  over  portions  of  the  state  space. 
Consequently,  the  density  approximation  is  not  itself  a  density. 

This  can  introduce  unexpected  influences  into  the  behavior  of  the 
estimator,  particularly  if  the  integral  over  the  region  in  which  the 
function  is  nonpositive  has  a  nontrivial  value.  Subsequently,  other 
density  approximations  using  the  Edgeworth  expansion  have  been  pro¬ 
posed  [e.g.,  23,24,34].  This  local  method  seems  to  be  most  useful  when 
the  a  posteriori  density  is  unimodal  even  though  it  is  not  Gaussian. 


» 
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G.  Parameter  Identification 


In  certain  cases  the  dynamic  system,  the  measurement  function, 
or  even  statistics  of  the  noise  or  initial  state  can  contain  unknown 
and  constant  elements.  Much  of  the  work  classified  as  system 
identification  addresses  this  problem  which  is  a  very  special  case 
of  the  nonlinear  estimation  problem.  Good  descriptions  of  these 
techniques  are  contained  in  references  [25,26]. 


VIII.  GLOBAL  NONLINEAR  FILTERING  METHODS 

The  obvious  disadvantage  of  the  local  methods  stems  from  the 
use  of  a  single  gTid  point  on  which  to  base  the  approximation. 

During  the  past  few  years,  several  methods  have  been  proposed  which 
attempt  to  improve  the  approximation  by  considering  the  density  at 
many  points  selected  through  the  region  containing  nonnegligible 
probability  mass.  These  methods  can  be  regarded  as  representing 
specific  examples  of  ways  in  which  the  numerical  integrations 
discussed  in  Section  V  can  be  accomplished.  Some  of  these  global 
approximations  are  reviewed  in  this  section. 

Quite  possibly,  the  first  step  toward  the  development  of  a 
global  method  was  taken  by  Magill  [27]  with  a  subsequent  generaliza¬ 
tion  by  Hilborn  and  Lainiotis  [28].  They  considered  linear  systems 
with  unknown  parameters.  To  deal  with  this  nonlinear  problem. 


a  grid  was  established  by  discretizing  the  unknown  parameters  and 
by  considering  the  resulting  collection  of  linear  filtering  problems. 

A  global  method  for  the  general  nonlinear  filtering  problem  was 
proposed  by  Bucy  [29]  when  he  introduced  the  point-mass  method. 

This  approach  was  elaborated  upon  by  Bucy  and  Senne  [30]  at  the 
First  Symposium  on  Nonlinear  Estimation  in  1970.  At  this  same 
meeting  Alspach  and  Sorenson  [3l]  proposed  the  gaussian  sum 
approximation  as  an  alternative  approach.  Subsequent  Symposia  on 
Nonlinear  Estimation  included  many  extensions  and  saw 
the  introduction  of  other  techniques.  Center  [32]  provided  a  unifying 
theoretical  framework  by  considering  the  problem  in  the  context  of 
generalized  least-squares.  His  approach  permits,  conceptually  at 
least,  the  development  of  a  countless  number  of  approximations. 

In  the  Second  Symposium  on  Nonlinear  Estimation,  Center  discussed 
as  specific  examples  the  point-mass,  gaussian  sum,  and  Edgeworth 
expansion  approximations.  Later  [33],  he  also  discussed  the  spline 
approximation  method  proposed  by  Jan  and  de  Figueiredo  [12]. 

All  specific  global  methods  must  provide  solutions  of  the 
following  general  problems. 

(a)  An  initial  grid  must  be  defined.  It  is  important  that  the 
region  encompassed  by  the  grid  includes  the  true  value  of  the  state. 

In  addition,  the  number  and  manner  in  which  the  grid  points  are 
distributed  within  the  approximation  region  must  be  defined. 


(b)  A  procedure  must  be  defined  for  defining  the  grid  at  each  subse¬ 
quent  sampling  time.  While  the  grid  could  be  the  same  throughout,  the 


dynamic  nature  of  the  problem  and  the  desire  for  computational  efficiency 
indicate  the  advisability  of  redefining  the  grid  at  each  sampling  time. 

(c)  Given  the  grid,  a  method  must  be  selected  for  approximating  the 
functions  and/or  for  carrying  out  the  Bayes1  rule  calculations.  The 
approximation  method  and  the  grid  selection  method  are  not  unrelated 
and  the  implementation  of  a  particular  method  may  require  interaction 
between  the  two  considerations. 


Below  we  will  show  in  some  detail  how  much  interaction  occurs 
for  one  of  the  methods  in  order  to  give  the  reader  a  feel  for  such 
interaction  in  one  particular  case.  The  other  methods  have  been  applied 
in  similar  problems  and  are  briefly  described  below  and  described  in 
detail  in  the  references. 


Alspach  and  Sorenson  [11,31,34,35]  proposed  approximating  the 
a  posteriori  density  function  by  a  weighted  sum  of  Gaussian  densities. 
For  example,  a  density  p  is  approximated  by  the  density*  p 

a 


q 

p  (x)  =  X  CX  .N  (x  .  ,P  . )  , 
i=l 


(45) 


where  the  weighting  coefficients  a.  are  nonnegative  and  X  Ct .  =  1 , 


i=l 


*Ny(a,B)  A  (2r)  n^2(det  B)  111  exp  {-  y  (x  -  a)iB  1  (x  -  a)} 


•1/2 


.-1 


w 


This  approximation  is  motivated  by  the  realization  that  p^  converges 
uniformly  to  p  for  a  large  class  of  densities.  Thus,  the  approximation 
can  be  made  as  accurate  as  one  wants  through  the  choice  of  q.  The 
idea  of  using  this  type  of  approximation  has  been  suggested  by  several 
others  [e.g.,  13,14,15,36]. 

After  q  has  been  defined,  it  is  necessary  to  assign  values  to 
the  parameters  cu,  x^,  ,  {i  =  1,  2,  q}.  The  mean  values 

represent  grid  points  for  the  approximation.  The  selection  of  all  of 
these  parameters  must  yield  a  satisfactory  representation  of  the 
a  posteriori  density.  It  is  natural  to  formulate  their  determination 
as  an  optimization  problem.  Let  us  choose  ou,  x^,  P^,  {i  =  1,  2,  q} 

so  that  the  generalized  least-squares  performance  index, 


€ 


LS 


Pa(x)]2 


dx , 


(46) 


is  minimized  subject  to  the  constraints  that  for  all  i,  cu  >  0,  Z  a.  =  1 
and  is  a  positive  semidefinite  matrix.  Figures  2  to  4  show  the 
approximations  resulting  in  fitting  three  different  scalar  densities 
by  such  a  direct  optimization  method.  These  densities  contain  most 
of  the  features  that  can  give  difficulty  in  the  various  density  func¬ 
tions  encountered  in  practice.  These  difficulties  include  discontinu¬ 
ities,  skewness,  unboundedness,  and  the  problem  of  converging  to  zero 
both  faster  and  slower  than  the  gausslan. 
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The  first  example  that  is  discussed  here  is  the  gamma  density 
function.  It  is  defined  as: 


P(x) 


for  x  <  0 
for  x  >  0 


(A7) 


This  distribution  has  a  mean  value  of  4  and  second,  third,  and  fourth 
central  moments  of  4,  8,  and  72,  respectively.  Figure  2  shows  the 
result  of  fitting  one  to  four  gaussians  to  this  density.  Note 
that  in  two  cases  several  moments  of  the  approximate  gaussian  sum 
density  were  constrained  to  match  the  moments  of  the  true  density. 

The  bad  effect  on  the  fit  indicates  difficulties  with  this  and 
other  moment  matching  techniques. 

The  second  density  approximated  is  a  uniform  density  function: 


p(x)  “  {  o 


-2<x<2 

elsewhere 


(48) 


The  obvious  symmetry  was  imposed  on  all  approximate  densities  in  order 
to  exactly  match  all  odd  moments.  The  results  of  fitting  2  to  5 
gaussians  to  this  density  are  shown  in  Figure  3.  Note  the  appearance 
of  a,fGibbs  phenomenon" on  the  last  approximate  density. 

The  last  density  approximated  and  reported  here  is  a  product 
of  two  independent  zero  mean  gaussian  random  variables. 
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z  »  xy 

(49) 

P(x)  - 

N(x,  4) 

(.50) 

p(y)  “ 

N(x,  4) 

(.51) 

p(z)  - 

KQ  (z/4)  /  4it 

(52) 

where  is  the  modified  bessel  function  of  the  second  kind  of  order 
zero.  Because  of  the  synmetry  of  this  density,  all  odd  moments  are 
zero.  The  second  and  fourth  central  moments  are: 

a2  -  E(z2)  -  16  and  E(z4)  *  2304  (53) 

This  density  and  one,  three,  and  five  gaussian  sum  approximations 
to  it  are  shown  in  Figure  A. 

The  development  of  such  approximations  requires  the  utilization  of 
numerical  search  techniques.  The  approximations  can  be  determined  off  line 
but  may  require  extensive  calculation.  This  approach  may  not  be  accep¬ 
table  in  many  circumstances.  Thus,  in  reference  [34],  an  alternate  sampler 
method  was  developed  and  entitled  the  "theorem  fit"  approach.  This  is 
done  as  follows: 

1.  The  number  of  Gaussian  terms  in  the  sum,  n,  is  chosen 
based  on  previous  experience. 

2.  The  region  (a,b)  over  which  the  density  function  is  to  be 
approximated  is  chosen. 

3.  The  mean  values  a^  for  each  Gaussian  are  placed  uniformly 
inside  (a,b). 
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4. 


The  weighting  functions  are  selected  to  be  propor¬ 


tional  to  the  value  of  the  density  function  to  be 
approximated  at  a^t  and  are  normalized  to  one. 

5.  The  value  of  the  variance  is  taken  to  be  Independent 
of  i  and  is  found  either  by: 

5.1  By  a  one-dimensional  search  to  minimize  the  L  ^ 
error; 

or 

5.2  Is  chosen  such  that,  O  *  z  —  ,  for  a  prespecified 

value  of  z. 

The  first  approach  is  called  the  "best  theorem  fit"  and  the 
second  the  '’smoothed  theorem  fit";  no  search  at  all  is  required  in 
this  second  method  of  obtaining  an  approximate  density. 

If  J  involves  more  than  one  region,  a  modification  of  this 
technique  has  been  used  which  treats  each  region  separately  and  takes 
the  number  of  terms  in  each  region  to  be  proportional  to  its  measure. 
Such  approximate  densities  for  the  uniform  and  gamma  densities  are 
shown  in  Figures  5  and  6.  In  these  figures  the  parameter  z  has 
been  chosen  to  be  .6. 

Gaussian  sum  densities  can  also  be  utilized  to  approximate 
densities  of  greater  than  one  dimension.  In  doing  this  it  is  possible 
to  take  into  account  natural  symmetries  of  the  density  to  be  approxi¬ 
mated.  For  example,  suppose  the  measurement  function  is  given  by: 


2 

z  "  h(x)  -  .1  Xj  +  (xj-x2)  + 

E(vk2)  -  .1 


The  measurement  function  is  then  given  by 


P(z|xj) 


N(z  -  .lxj  +  (xx-x2)  ,  .1) 


(54) 

(55) 


(56) 


or  for  the  particular  case  of  z  equal  to  1  this  function  is  shown  in 
Figure  7(a)*  If  the  a  priori  mean  x  is  zero 


A 

X 


(57) 


and  this  is  taken  as  the  linearization  point,  the  approximation  utilized 
in  an  extended  Kalman  filter  for  this  function  is  shown  in  Figure  7(b). 

A  simple  two-term  gaussian  sum  shown  in  Figure  7(c)  gives  a  far  better 
approximation  to  the  true  density  and  a  30- term  smoothed  theorem  fit 
density  shown  in  Figure  7(d)  captures  even  more  of  the  fine  details 
of  the  original  function. 

Another  example  of  using  a  gaussian  sum  density  to  approximate 
a  two-dimensional  function  arises  in  the  passive  bearing’s  only 
tracking  problem  (reference  [34]).  A  target  is  located  at  x^  or 
(xk,  yk)*  in  Figure  8(a)  and  is  observed  by  a  ship  at  location  S 
which  measures  the  angle  a.  This  geometry  is  shown  in  Figure  8(a). 

The  measurement  function  is 
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2  -  h^)  -  tan”1  [(yfc  -  sin  -  cos  6k>]  +  (58) 

E(vk^)  ■  o  ' ^  =  .01  radian^  (59) 

and  the  function  P(zk/xk)  is  shown  in  Figure  8(b).  If  an  extended 
Kalman  filter  linearizes  this  function  around  the  most  recent  line 
of  bearing  the  approximate  function  shown  in  Figure  8(c)  results. 

This  is  too  wide  if  the  target  is  closer  than  the  a  priori  estimate 
and  too  narrow  if  the  target  is  farther  from  the  origin  than  the 
original  estimate.  This  feature  of  the  approximation  can  lead  to 
’’range  collapse"  or  divergence  where  the  estimate  steps  to  the  origin. 
However,  the  general  shape  of  this  extended  Kalman  filter  approxima¬ 
tion  is  correct.  If  one  linearizes  about  the  last  estimated  position 
which,  however,  happens  not  to  lie  in  the  measured  bearing,  very  bad 
approximations,  away  from  the  linearization  point,  can  result.  Then 
one  gets  functions  which  bear  little  resemblance  to  the  true  mea¬ 
surement  function  just  as  in  the  last  example.  A  ten-term  gaussian 
sum  "smoothed  theorem  fit"  is  shown  in  Figure  8(d).  Note  in  Figures 
8(b)  and  8(d)  it  has  been  assumed  that  the  true  state  cannot  lie 
greater  than  six  orbital  radii  away  from  the  observer.  This  accounts 
for  the  cutoffs  on  both  figures. 

There  are  a  variety  of  ways  in  which  these  general  approxima¬ 
tions  can  be  utilized.  For  example,  if  the  a  priori  density  is 
approximated  by  a  Gaussian  sum  density  with  N  terms,  then  this 
defines  a  generalized  grid  in  the  initial  state  space.  If  the  plant 


5 


and  measurement  functions  are  linear  with  gaussian  perturbing  noise, 
then  the  evolution  of  each  term  in  the  a  priori  gaussian  sum  density 
is  described  by  a  linear  Kalman  filter.  For  example,  in  the  simple 
example  described  earlier  with  a  scalar  state  and  uniform  initial 
state  density,  the  evolution  of  the  true  density,  the  gaussian  sum 
approximate  density,  and  a  single  linear  Kalman  filter  density  with 
time  are  shown  in  Figure  9. 

In  this  case  only  the  a  priori  density  Pfx^)  -  tt(x)  needs  to  be 
approximated  by  a  gaussian  sum,  and  the  a  posteriori  density 
P^/z*)  can  easily  be  shown  to  be  a  gaussian  sum  with  N  terms. 

The  more  general  case  of  non-gaussian  plant  and  measurement 
noise  each  approximated  by  gaussian  sum  has  also  been  considered. 

In  this  case  the  a  posteriori  density  is  also  a  gaussian  sum  but 
the  number  of  terms  in  the  density  grows  with  the  number  of  stages. 
This  is  shown  in  detail  in  reference  [ll]. 


The  more  general  case  of  a  nonlinear  measurement  equation 
(z^  ■  is  considered  in  reference  [35].  Here,  in  order  to 

enfold  the  measurement  z,  it  is  necessary  to  multiply  tt(x)  by 
m(z-h(x)).  If  h  is  nonlinear,  the  product  is  no  longer  a  Gaussian  sum. 
One  approximation  at  this  point  is  to  linearize  h(x)  about  each  of 
these  grid  points.  Because  the  variance  of  each  term  of  the  sum 
is  small,  the  linearization  must  be  valid  only  in  a  small  region 
surrounding  the  grid  point.  The  extended  Kalman  filter  can  be 
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applied  at  each  grid  point  to  obtain  the  means  x^  and  covariances 
needed  in  the  gaussian  sum  approximation  of  the  filtering  density 
from  that  of  the  prediction  density.  The  extended  Kalman  filter  is 
used  to  obtain  a  grid  for  the  next  sampling  time  and  to  obtain  the 

t 

gaussian  sum  approximation  of  the  prediction  density  n.  Next, 
using  Eq.  32,  the  new  predicted  density  tt(x)  is  calculated.  If  the 
system  dynamics  are  linear,  then  this  can  be  solved  exactly  and  again 
the  predicted  density  is  a  Gaussian  sum.  If  not,  fk(x)  must  be 
linearized  about  each  grid  point  as  in  an  extended  Kalman  filter  and 
then  the  next  stage  density  is  again  in  a  Gaussian  sum  form.  A  simple 
quadratic  scalar  example  of  this  is  taken  from  reference  [35]. 


Consider  the  scalar  system  with  the  plant  described  by 

2 


*k+l  “  xk  +  nxk 


+  w. 


(60) 


The  state  x,  is  to  be  estimated  from  the  measurement  data  z  where 


Xk  +v 


o.i,-* 


(61) 


The  initial  state  and  the  plant  and  measurement  noise  sequences  are 


Independent,  white,  and  gaussian  with 

E(x0’)  -  1;  e[(xq  ■  &0’)2]  -  l  ; 

(62) 

E(wk)  -  E(vk)  -  0  ; 

(63) 

(64) 


2  2  2  2 
E("k  E<\  >  '  ' 

The  a  priori  mean  and  variance  of  the  initial  state  are  held  at  these 
values  for  all  examples  presented  here,  although  others  have  been 
investigated.  The  basic  parameters  of  the  system  in  the  present 
study  are  the  variances  of  the  plant  and  measurement  noise  and  the 
relative  effect  of  the  plant  nonlinearity  n-  These  variances  have 
been  chosen  to  be  independent  of  k  for  clarity  of  presentation  only. 
The  value  of  each  of  these  parameters  will  be  specified  for  each 
case  presented. 

Results  for  four  different  filters  are  presented  and  discussed, 
although  not  all  results  are  included  in  Figures  10  and  11.  When 
a  filter  performs  very  badly,  it  may  fall  off  the  scale  of  the  charts 
and  thus  not  be  shown.  The  first  three  are  filters  that  have  been 
considered  previously  in  the  literature  and  in  which  the  a  posteriori 
and  predicted  density  functions  are  assumed  to  be  gaussian  at  each 
stage.  The  first  of  these  is  the  extended  Kalman  filter.  This  is 
the  filter  most  often  used  in  practice.  The  second  filter  uses  one 
iteration  to  improve  the  reference  values  used  in  the  linearization. 
The  third  filter  is  the  gaussian  filter  of  [20],  where  second-order 
terms  are  used  to  modify  the  mean  and  variance  of  the  next  stage 
predicted  and  a  posteriori  density  functions.  The  fourth  is  the 
gaussiap  sum  filter. 
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The  characteristics  of  the  filtering  problem  depend  heavily  on 


the  position  of  the  state  variable  x^  with  respect  to  the  point  of 

symmetry  of  the  measurement  nonlinearity.  When  is  near  that  point 

2 

(zero  in  this  case),  the  ratio  x^  /ov  is  small  and  the  gaussian 
filters  tend  to  diverge.  As  the  state  moves  away  from  this  point, 
the  measurement  nonlinearity  becomes  increasingly  more  negligible 
and  the  gaussian  filters  tend  to  perforr  well.  This  is  particularly 
clear  when  there  is  no  plant  nonlinearity  n  *  0  and  no  plant  noise 
=*  0.  In  this  case  the  relative  performance  of  the  different 
filters  depends  most  strongly  on  the  value  of  the  state  variable 
and  less  on  the  particular  measurement  realization  under  considera¬ 
tion.  For  this  reason  it  was  found  best  with  a  limited  number  of 
realizations  to  choose  the  true  initial  value  of  state  as  a  parameter 
and  only  select  the  measurement  and  plant  noise  from  a  random  num¬ 
ber  generator.  This  was  particularly  useful  in  the  Monte-Carlo 
averages,  but  was  done  in  all  the  cases  presented  below. 

When  there  is  no  plant  nonlinearity  [r)  *  0  in  (60)],  it  is 
impossible  from  the  available  measurement  data  to  discriminate  between 
the  true  value  of  the  state  and  the  negative  of  that  value.  Thus 
p(XjJz^)  should  become  bimodal  if  the  value  of  the  state  is  nonzero. 
This  is,  of  course,  not  possible  for  any  of  the  gaussian  filters. 

When  there  is  no  plant  noise  or  nonlinearity,  the  a  posteriori 
density  can  be  computed  exactly.  Under  these  conditions  it  is 
(except  for  a  normalization  constant)  simply  given  by 


cPx0<xo)  Pv0(z0'  h^x0>>  *  *  *  Pvk(zk  '  h(xk})  *  (65) 


PCxJz*) 

The  density  function  of  a  specific  realization  is  depicted  in  Figure 
10.  The  values  of  the  system  parameters  are  stated  in  the  figure. 

The  gaussian  sum  filter  provided  an  approximation  that  is  indistin¬ 
guishable  from  the  true  a  posteriori  density  for  the  example.  In 
this  case  the  a  priori  density  p(xg)  was  approximated  by  a  sum  of 
40  gaussians.  Observe  that  the  second-order  filter  provides  an 
extremely  conservative  result  and  estimates  the  state  to  be  zero 
instead  of  ±0.2.  The  extended  Kalman  filter  tends  to  diverge. 

Only  the  iterated  filter  performs  at  all  satisfactorily  and  pro¬ 
vides  an  estimate  of  approximately  0.2. 

It  is  interesting  that  the  minimum  variance  estimate  that  one 
i  k 

would  obtain  from  p(x^|z  )  provides  an  estimate  that  is  between 
the  two  peaks  (i.e.,  since  the  conditional  mean  is  the  minimum 
variance  estimate).  Clearly,  this  estimate  is  very  conservative 
and,  consequently,  may  be  unsatisfactory.  A  maximum  likelihood 
estimate  would  yield  a  value  close  to  the  true  value  or  its 
negative. 

When  a  plant  nonlinearity  from  (61)  is  included,  it  is  possible 
to  distinguish  between  the  two  values  and  the  gaussian  sum  filter 
quickly  selects  the  proper  peak.  This  is  shown  in  Figure  11  where 
the  value  of  n  is  -0.2.  Since  the  state  has  a  negative  value,  the 
gaussian  filters  all  perform  unsatisfactorily,  so  only  the  results 
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of  the  gaussian  sum  filter  are  shown.  This  example  demonstrates  the 
difficulty  that  a  maximum  likelihood  estimator  might  encounter. 

It  is  observed  that  the  maximum  value  of  p(x^|z^)  switches  back  and 
forth  from  positive  to  negative.  Without  complete  knowledge  of  the 
density  function,  it  is  unlikely  that  a  procedure  could  be  devised 
that  would  reflect  this  behavior. 

In  the  previously  described  use  of  gaussian  sums,  the  gaussian 
sum  approximation  takes  the  form  of  a  number  of  extended  Kalman 
filters  operating  in  parallel.  It  is  easy  to  obtain  an  indication 
of  the  computational  burden  that  is  associated  with  this  nonlinear 
filter.  If  q  extended  Kalman  filters  are  required  at  each  stage 
of  the  sequence,  then  the  gaussian  sum  requires  approximately  q 
times  as  much  effort  as  a  single  filter.  The  burden  of  a  single 
filter  is  well  known  [e.g.,  37].  The  general  use  of  parallel  pro¬ 
cessors  in  this  problem  has  been  considered  by  several  authors 
[38  -  41]. 

In  some  problems  it  makes  more  sense  to  approximate  j 

or  P(z^/x^)  directly  as  a  gaussian  sum  at  each  stage.  In  this  case, 
the  number  of  terms  in  the  gaussian  sum  approximation  to  the 
a  posteriori  density  grows  with  the  number  of  stages.  It  is,  how¬ 
ever,  possible  to  drop  any  term  whose  weighting  coefficient,  a^, 
is  negligible.  It  is  also  often  possible  to  combine  two  or  more 
terms  whose  grid  points  from  prediction  fall  sufficiently  close 
together.  In  this  way  the  total  number  of  terms  in  the  gaussian 
sum  can  be  controlled. 


ORINCON 


♦ 


t 


An  example  of  this  type  is  the  vector  tracking  example  whose 
measurement  density  function  PCz^/x^)  was  described  earlier  in  Fig¬ 
ure  8  and  Equation  (58). 

The  state  vector  propagates  according  to  the  linear  plant 

*k+i  *  + 

and  the  state  is  observed  by  the  scalar  nonlinear  measurement 
function  of  Equation  (58)  where 


Bk  =  Bq  +  B(k  -  1) 

where  3^  and  8  are  given  constants.  The  a  priori  random  variables 
x0,  v  ,  and  w^  are  white,  independent,  gaussian  random  variables 
and  sequences. 

The  preceding  model  arises  in  connection  with  the  tracking 

T 

geometry  of  Figure  8  where  target  T  at  the  position  defined  = 

(x^  y^)  is  undergoing  a  random  walk  in  the  two-dimensional  state 

space.  The  observer  S  is  passively  measuring  the  line-of-sight  a  as 
it  travels  in  a  deterministic  orbit  around  the  unit  circle. 

Results  obtained  from  the  application  of  the  gaussian  sum 
filter  to  a  specific  example  are  shown  in  Figure  12.  The  position 
of  the  observer  is  shown  by  the  cross  on  the  unit  orbit  and  the 
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cross  on  the  density  function  shows  the  true  position  of  the  target. 
The  a  priori  estimate  for  the  initial  state  was  taken  to  be 


5  0 
0  1 


> 


while  the  true  value  of  the  initial  state  (and  all  subsequent 
values  since  there  is  no  plant  noise)  was  taken  to  be 


The  measurement  noise  has  a  one  sigma  value  of  0.01  rad  or  about 
one-half  degree.  The  non-gaussian  a  posteriori  filtering-density 
function  is  seen  to  propagate  from  stage  1  to  stage  9  in  this  figure 
where  a  measurement  is  taken  every  10°. 


In  Figure  13  results  obtained  using  the  extended  Kalman 
filter  and  the  gaussian  sum  filter  are  compared.  The  parameters 
€xkJ  €yk*  an(*  ^k  ^or  a  s^n8^-e  realization  are  presented.  The  improve¬ 
ment  provided  by  the  gaussian  sum  filter  is  striking. 


A  number  of  other  interesting  nonlinear  non-gaussian  stochastic 
dynamic  systems  have  been  investigated  utilizing  these  techniques 
in  the  literature,  and  the  reader  is  directed  there  for  more 


details  on  specific  problems. 


If  the  covariance  term  in  the  M theorem  fit"  method  of 


approximating  densities  is  made  very  small,  one  ends  up  with  gaussian 
shaped  delta  functions  with  a  weighting  equal  to  the  function  value 
at  that  point.  Such  an  approximation  is  a  very  bad  fit  to  the  density 
in  an  sense  but  distributions  and  moments  for  such  an  approximation 
can  be  arbitrarily  close  to  those  from  the  exact  density  as  the  density 
of  the  grid  increases.  In  this  way  one  can  move  from  the  gaussian 
sum  density  approximation  to  the  "point  mass"  approximation. 


The  point  mass  approximation  was  introduced  by  Bucy  and  Senne 
in  1970.  Bucy  [29]  and  Bucy  and  Senne  [30]  have  suggested  that  the  error 
covariance  matrix  be  used  to  establish  the  region  and  the  grid.  Essen¬ 
tially,  the  eigenvectors  are  used  to  define  the  principal  axes.  The 
grid  is  centered  at  the  mean  value.  The  grid  along  each  axis  was  chosen 
to  extend  over  a  distance  sufficient  (e.g.,  16  times  the  magnitude  of 
the  corresponding  eigenvalue)  to  insure  that  the  true  state  is  contained 
within  the  grid  region.  The  number  of  grid  points  is  prescribed  to 
provide  an  adequate  approximation.  The  basic  method  of  defining  the 
grid  is  modified  to  suit  the  requirements  imposed  by  a  particular 
problem.  For  example,  when  the  a  posteriori  density  is  multimodal, 
it  is  reasonable  to  define  a  grid  for  each  mode  rather  than  for  the 
entire  density. 
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The  manner  in  which  the  grid  is  updated  at  the  next  sampling 
time  is  straightforward  since  the  system  dynamics  provide  the  mean 
values  and  the  covariance  matrix  for  the  predictor  density. 

Once  the  grid  points  have  been  established,  the  density  func¬ 
tions  can  be  evaluated  at  each  of  them.  These  values,  after  being 
suitably  normalized,  can  be  regarded  as  point  masses  for  a  discrete 
approximation  of  the  distributions.  Using  the  point-mass  approxima¬ 
tion,  the  Bayesian  recursion  relations  are  readily  evaluated.  This 
approximation  is  essentially  equivalent  to  using  a  rectangular  integra¬ 
tion  rule  to  accomplish  the  numerical  quadratures. 

IX.  NONLINEAR  FILTERING — A  CRITICAL  LOOK 

Global  nonlinear  filtering  is  growing  beyond  its  infancy.  As 
must  be  true  for  any  infant,  the  first  steps,  as  exemplified  by 
the  work  mentioned  above,  are  exhilarating  for  those  involved  and 
can  easily  lead  to  overly  ambitious  claims  and  unwarranted  optimism* 
Viewed  with  even  a  modicum  of  perspective,  however,  it  becomes 
obvious  that  much  work  remains  before  the  infant  will  grow  to 
maturity.  It  is  fun  and, hopefully,  worthwhile  to  attempt  to  predict 
the  character  of  the  mature  development  and  to  suggest  some  activities 
that  are  required  to  shape  the  development. 


The  basic  objective  of  global  nonlinear  filtering  might  be 
regarded  as  the  development  of  a  practical  computational  algorithm 
which  will  permit  the  determination  of  the  a  posteriori  density  to 
any  prescribed  accuracy  for  any  system.  This  is  the  achievement 

> 

of  the  Kalman  filter  for  linear,  gaussian  systems;  if  it  can  be 
accomplished  for  nonlinear  non-gaussian  systems,  the  achievement 
would  be  worthy  of  any  of  the  scientific  titans  of  history.  The 
developments  described  above  do  provide  procedures  for  computing  the 
a  posteriori  density  for  any  system.  But  they  have  the  practical 
limitation  that  the  computational  requirements  associated  with 
their  implementation  are  enormous.  Thus,  the  development  of  an 
algorithm  must  be  guided  by  the  requirement  of  achieving  corapu- 
tational  efficiency.  With  the  rapid  development  of  mini-computers, 
it  appears  that  practical  nonlinear  filtering  may  be  possible  using 
special-purpose  rather  than  general-purpose  digital  computers.  It 
^  appears  reasonable  to  consider,  for  example,  the  use  of  mini¬ 

computers  for  parallel  processing.  Possibly,  some  of  the  general 
ideas  discussed  by  Korn  [42]  will  prove  useful. 

)  Assuming  that  global  nonlinear  filtering  methods  will  con¬ 

tinue  to  require  substantially  more  computation  than  local  filtering 
techniques,  it  is  natural  to  ask  and  attempt  to  answer  the  follow- 
t  ing  questions.  Under  what  conditions  is  it  desirable  and  necessary 

to  assume  the  additional  computational  burden  and  utilize  global 
nonlinear  filtering  techniques?  Certainly,  no  answer  to  this 


question  that  could  be  universally  accepted  exists  at  this  time.  How¬ 
ever,  some  related  considerations  can  be  discussed. 


Local  filtering  techniques  in  general  and  the 
extended  Kalman  filter  (EKF)  in  particular  are  looked  upon 
with  scorn  in  some  quarters  because  these  approaches  are  "sub- 
optimal."  In  addition,  the  degree  of  subopt imality  is  not  readily 
determined.  As  a  consequence,  it  is  reasonable  to  solve  the  global 
filtering  problem  if  only  to  provide  a  reference  against  which  local 
methods  can  be  compared.  However,  the  continued  use  of  the  EKF 
must  be  tolerated  because  it  has  proven  to  provide  satisfactory 
results  for  many  nonlinear  systems.  This  is  especially  true  when 
the  filter  is  designed  to  monitor  the  residuals  and  to  Initiate 
corrective  action  whenever  a  low  frequency  component  is  observed 
that  implies  the  onset  of  divergence. 

The  success  of  the  EKF  forces  a  search  for  general  circum¬ 
stances  in  which  this  local  filtering  method  cannot  be  expected  to 
perform  satisfactorily.  Certainly,  one  of  the  most  Important 
requirements  is  that  an  a  priori  estimate  be  available  which  permits 
the  local  approximation  to  be  valid  initially.  If  it  is  impossible 
to  define  an  appropriate  a  priori  estimate,  then  the  EKF  is  doomed 
to  failure  and  a  global  filter  is  required.  For  many  systems  of 
interest,  this  would  appear  to  be  an  unlikely  situation.  Frequently, 
the  signal- to-noise  ratio  is  sufficiently  large  that  a  reasonable 
estimate  can  be  obtained  using  only  deterministic  models.  When 
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more  than  one  solution  Is  possible,  physical  consideration  may  permit 
the  determination  of  the  only  reasonable  solution  which  can  then  be 
used  to  initiate  the  EKF.  If  more  than  one  a  priori  estimate  must 
be  considered,  the  a  posteriori  density  will  be  multimodal  so  the 
EKF  cannot  be  used. 

If  the  a  posteriori  density  can  be  regarded  as  unimodal  but 
non-gaussian,  the  EKF  must  produce  suboptimal  results.  Thus,  it 
may  be  desirable  to  utilize  local  or  global  procedures  which  elimi¬ 
nate  the  gaussian  assumption.  In  many  cases,  the  EKF  can  be  expected 
to  provide  pessimistic  results  since  the  gaussian  density  maximizes 
entropy.  As  long  as  the  residual  is  forced  to  be  white,  the  EKF 
should  produce  results  that  are  satisfactory  in  some  ways.  More 
complicated  procedures  may  provide  improvements  but  this  would  seem 
to  be  very  problem-dependent. 

Finally,  the  signal-to-noise  ratio  may  be  so  small  that  linear¬ 
izations  provide  inadequate  approximations  with  the  result  that  the 
EKF  produces  little  data  filtering.  That  is,  the  divergence  control 
logic  may  require  past  data  to  be  discounted  so  strongly  that  only 
current  data  is  used  in  determining  the  estimate.  Then,  the  estima¬ 
tion  error  will  be  comparable  or  greater  than  the  measurement  noise 
indicating  the  lack  of  any  filtering  (noise  removal)  activity.  In 
this  case,  global  nonlinear  filters  may  be  required  in  order  to 
extract  the  maximum  Information  from  the  data. 


Among  the  advantages  that  can  result  from  the  use  of  global 
nonlinear  filtering  are  the  following: 

(a)  It  is  not  necessary  to  have  a  priori  estimates  of  the  state 
that  are  sufficiently  accurate  to  validate  the  linearization.  Thus, 
the  problem  of  initializing  the  filter  is  eliminated. 

(b)  Situations  in  which  the  a  posteriori  density  is  multi¬ 
modal  are  handled  in  a  straightforward  manner.  The  consideration 

of  multimodality  enters  primarily  through  the  definition  of  the  grid 
and  the  choice  of  estimator  criterion. 

(c)  The  elimination  of  the  assumption  that  the  a  posteriori 
density  is  gaussian  can  permit  more  accurate  statistical  statements 

to  be  made.  A  simple  example  is  given  in  Ref.  [l9]  which  demonstrates 
the  insights  possible  from  knowledge  of  the  a  posteriori  density. 

(d)  Calculation  of  the  a  posteriori  density  provides  a  meaning¬ 
ful  reference  which  can  be  used  to  measure  the  performance  of  all 

k 

suboptimal  procedures.  The  accurate  calculation  of  p(x^/z  )  permits 
one  to  more  rationally  evaluate  the  effects  of  the  approximations 
used  in  suboptimal  estimators.  Generally,  even  suboptimal  estimators 
approach  the  optimal  response  of  the  global  filter  after  a  large 
quantity  of  data  has  been  processed*  The  difference  in  transient 
response  can  be  determined  and  can  provide  a  measure  of  the  adequacy 
of  a  particular  suboptimal  algorithm. 


In  the  study  of  nonlinear  filtering,  it  is  not  surprising  to 
find  that  there  are  few  analytical  results  and  closed-fora  solutions. 
Thus,  to  deal  with  these  problems,  it  is  natural  to  see  a  concentra¬ 
tion  of  effort  on  the  development  of  computational  procedures.  In 
this  sense,  the  field  is  similar  to  the  study  of  nonlinear  program¬ 
ming.  Unlike  the  latter,  we  do  not  have  standard  test  problems  nor 
extensive  numerical  studies  of  different  algorithms  which  have  been 
developed  for  the  same  general  problem.  It  seems  that  this  is  a  gap  that 
must  be  filled.  Several  problems  that  have  appeared  in  the  literature  and 
have  been  described  above  can  serve  as  candidates  for  standard  test 
problems.  Rational  criteria  for  comparing  algorithms  need  to  be 
established.  It  should  be  incumbent  upon  the  proposer  of  a  new  algo¬ 
rithm  to  provide  meaningful  comparisons  of  his  procedure  with  existing 
algorithms.  By  this  means  one  can  hope  to  establish  situations  in 
which  specific  algorithms  will  have  demonstrable  advantages. 

As  nonlinear  filtering  begins  to  see  practical  application,  a 
wealth  of  new  problems  will  be  uncovered  and  the  research  will 
progress  into  new  areas.  A  question  which  requires  Immediate  con¬ 
sideration  arises  when  we  contemplate  the  basic  assumptions  implicit 
in  the  Bayesian  recursion  relations.  This  solution  of  the  nonlinear 
filtering  problem  supposes  that  we  have  a  complete  probabilistic 
description  of  the  system.  In  practice,  one  often  considers  himself 
lucky  to  have  information  about  the  second  moments.  Thus,  it  is 
naive  to  believe  that  the  probabilistic  model  is  justified. 


Consequently,  It  is  Imperative  that  the  sensitivity  to  model  errors 
be  examined  in  considerable  detail.  On  one  hand,  it  might  be  possible 
to  reduce  the  computational  burden  associated  with  the  current  global 
filters  by  exploiting  the  knowledge  that  model  errors  exist.  On  the 
other  hand,  sensitivity  to  model  errors  might  indicate  the  folly  of 
the  Bayesian  approach  entirely  and  cause  the  redirection  of  research 
activities  into  less  model-dependent  formulations. 
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1.  The  true  value  of  one  sigma  and  the  Kalman  approximation 
to  it  versus  z^. 

2 

2.  One  to  four  Gaussians  in  L  search  fit  to  a  gamma  density. 

2 

3.  Gaussian  sum  approximation  to  uniform  density  L  search  fit 
2  to  5  terms. 

2 

4.  1,  3,  and  5  L  search  fit  approximations  to  product  density. 

5.  Gaussian  sum  approximations  of  uniform  density  functions. 

6.  6  and  10  term  Gaussian  sum  theorem  fit  approximation  to  a 
gamma  density. 

7.  Measurement  density  function  and  approximation. 

8.  The  passive,  bearings-only  tracking  problem. 

9.  Behavior  of  the  a  posteriori  density — true  and 
approximate. 

10.  Filtering  density  and  approximations.  Solid  line  is  true 
PDF.  Broken  line  is  Gaussian  sum,  x  x  is  second  order. 

+  • • •  +  is  iterated. 

11.  Gaussian  sum  approximation  to  filtering  density  for  nonlinear 
plant  and  measurement.  Solid  line  is  Gaussian  sum  PDF. 

o  •••  •  is  true  value  of  state,  Xq  *  -0.2,  n  *  -0.2, 

a  =  0,  and  a  *  0.05. 
w  v 

12.  Filtering  density  for  vector  tracking  example. 
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13.  Relative  performance  of  extended  Kalman  and  Gaussian  sum 
filters  for  tracking  problem.  Broken  line  denotes  Kalman. 
Solid  line  denotes  Gaussian  sum. 
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